Elpub : Digital Library : Works

Paper 322_elpub2008:
Web topic summarization

id 322_elpub2008
authors Steinberger, Josef; Jezek, Karel; Sloup, Martin
year 2008
title Web topic summarization
source ELPUB2008. Open Scholarship: Authority, Community, and Sustainability in the Age of Web 2.0 - Proceedings of the 12th International Conference on Electronic Publishing held in Toronto, Canada 25-27 June 2008 / Edited by: Leslie Chan and Susanna Mornati. ISBN 978-0-7727-6315-0, 2008, pp. 322-334
summary In this paper, we present our online summarization system of web topics. The user defines the topic by a set of keywords. Then the system searches the Web for the relevant documents. The top ranked documents are returned and passed on to the summarization component. The summarizer produces a summary which is finally shown to the user. The proposed architecture is fully modular. This enables us to quickly substitute a new version of any module and thus the quality of the systemís output will get better with module improvements. The crucial module which extracts the most important sentences from the documents is based on the latent semantic analysis. Its main property is independency of the language of the source documents. In the system interface, one can choose to search a news site in English or Czech. The results show a very good search quality. Most of the retrieved documents are fully relevant, only a few being marginally relevant. The summarizer is comparable to state-of-the-art systems.
keywords Information retrieval; searching; summarization; latent semantic analysis
series ELPUB:2008
type normal paper
email jstein@kiv.zcu.cz
content file.pdf (508,484 bytes)
discussion No discussions. Post discussion ...
urn:nbn urn:nbn:se:elpub-322_elpub2008
last changed 2008/08/03 05:39
HOMELOGIN (you are user _anon_66643 from group guest) Powered by SciX Open Publishing Services 1.002