242_elpub2006:
Text Parsing of a Complex Genre

id 242_elpub2006
authors Lüngen, Harald; Bärenfänger, Maja; Hilbert, Mirco; Lobin, Henning; Puskás, Csilla
year 2006
source ELPUB2006. Digital Spectrum: Integrating Technology and Culture - Proceedings of the 10th International Conference on Electronic Publishing held in Bansko, Bulgaria 14-16 June 2006 / Edited by: Bob Martens, Milena Dobreva. ISBN 978-954-16-0040-5, 2006, pp. 247-256
summary A text parsing component designed to be part of a system that assists students in academic reading an writing is presented. The parser can automatically add a relational discourse structure annotation to a scientific article that a user wants to explore. The discourse structure employed is defined in an XML format and is based the Rhetorical Structure Theory. The architecture of the parser comprises preprocessing components which provide an input text with XML annotations on different linguistic and structural layers. In the first version these are syntactic tagging, lexical discourse marker tagging, logical document structure, and segmentation into elementary discourse segments. The algorithm is based on the shift-reduce parser by Marcu (2000) and is controlled by reduce operations that are constrained by linguistic conditions derived from an XML-encoded discourse marker lexicon. The constraints are formulated over multiple annotation layers of the same text.
keywords text parsing; discourse parsing; XML applications; rhetorical structure
series ELPUB:2006
type normal paper
email luengen@uni-giessen.de
more http://info.tuwien.ac.at/elpub2006/presentations/242.pdf
content file.pdf (615,019 bytes)
urn:nbn urn:nbn:se:elpub-242_elpub2006
last changed 2006/06/24 11:13
