Elpub : Digital Library : Works

Paper 215elpub2005:
OCR Alternatives for Electronic Publishing of Digitised Documents

id 215elpub2005
authors Stefan Pletschacher
year 2005
title OCR Alternatives for Electronic Publishing of Digitised Documents
source ELPUB2005. From Author to Reader: Challenges for the Digital Content Chain: Proceedings of the 9th ICCC International Conference on ElectronicPublishing held at Katholieke Universiteit Leuven in Leuven-Heverlee(Belgium), 8-10 June 2005 / Edited by: Milena Dobreva & Jan Engelen, ed. byPeeters Publishing Leuven, ISBN 90-429-1645-1, 2005
summary This paper describes a general approach on how digitised documents may be automatically prepared for being stored and processed on various digital platforms. The focus is on documents that are not suitable for optical character recognition (OCR) methods but provide regular structures in the form of text-like blocks. By extracting a document immanent alphabet, preserving the graphical representations by means of vectorisation and based on these steps encoding the original document, it is possible to gather benefits of encoded text without the effort and the possible mistakes that arise from recognition methods. The use of the Extensible Markup Language (XML) for structural descriptions and Scalable Vector Graphics (SVG) for graphical representations enables a seamless integration into style sheet based output workflows for producing system specific layouts.
series ELPUB:2005
type full paper
email stefan.pletschacher@mb.tu-chemnitz.de
content file.pdf (292,338 bytes)
discussion No discussions. Post discussion ...
ratings Ratings: 3
urn:nbn urn:nbn:se:elpub-215elpub2005
last changed 2005/06/02 18:31
HOMELOGIN (you are user _anon_975931 from group guest) Powered by SciX Open Publishing Services 1.002