Author Elizabeth Tse and Josef Bigun
Year 2007
PublicationType Conference Paper
HostPublication IEEE International Conference on Systems Man and Cybernetics Conference Proceedings
Conference IEEE International Conference on Systems, Man and Cybernetics, 7-10 Oct. 2007, Montreal, Que.
Abstract Serto is the cursive alphabet of Syriac-Aramaic, which is used by the largest corpus of documents in libraries in Aramaic. A lingua franca, and often a source language, Aramaic has influenced major Judaic, Christian and Islamic thoughts as well as the development of science. The script is cursive, e.g. Arabic, and consequently it has a hand-writing appearance compared to Latin. Serto, and Aramaic in practice, has not an automatic character recognition system, OCR Most library documents are reproductions using printed characters. The readers would strongly benefit from having an OCR, as these reproductions are predominantly books, printed in the pre-computer era. We propose a segmentation-free OCR using linear symmetry features with an individual threshold for the tensors of the characters, and an ordered search sequence. It yields ~ 90 % correctly identified characters in the average. As a first recognition scheme for Serto, it represents a base-line OCR for Syriac-Aramaic.