OCR Alternatives for Electronic Publishing of Digitised Documents
||OCR Alternatives for Electronic Publishing of Digitised Documents
||ELPUB2005. From Author to Reader: Challenges for the Digital Content Chain: Proceedings of the 9th ICCC International Conference on ElectronicPublishing held at Katholieke Universiteit Leuven in Leuven-Heverlee(Belgium), 8-10 June 2005 / Edited by: Milena Dobreva & Jan Engelen, ed. byPeeters Publishing Leuven, ISBN 90-429-1645-1, 2005
||This paper describes a general approach on how digitised documents may be automatically prepared for being stored and processed on various digital platforms. The focus is on documents that are not suitable for optical character recognition (OCR) methods but provide regular structures in the form of text-like blocks. By extracting a document immanent alphabet, preserving the graphical representations by means of vectorisation and based on these steps encoding the original document, it is possible to gather benefits of encoded text without the effort and the possible mistakes that arise from recognition methods. The use of the Extensible Markup Language (XML) for structural descriptions and Scalable Vector Graphics (SVG) for graphical representations enables a seamless integration into style sheet based output workflows for producing system specific layouts.
||file.pdf (292,338 bytes)
Post discussion ...
These pages are best viewed with any standards compliant browser (e.g. Mozilla).