Elpub : Digital Library : Works

Paper 02-28:
USE OF TEXT MINING METHODS IN A DIGITAL LIBRARY

id 02-28
authors Hynek, Jiri; Jezek, Karel
year 2002
title USE OF TEXT MINING METHODS IN A DIGITAL LIBRARY
source elpub2002 - Technology Interactions. Proceedings of the 6th International ICCC/IFIP Conference on Electronic Publishing held in Karlovy Vary, Czech Republic, 6–8 November 2002. Editors: Carvalho, Joao Álvaro; Hübler, Arved; Baptista, Ana Alice. Publisher: VWF Berlin, 2002. ISBN 3-89700-357-0. 395 pages.
summary The article deals with use of Itemsets classifier based on inductive machine learning in the context of digital library environment. We provide a brief description of a real-world digital library implemented at a power utility. Its implementation and operating experience have motivated our research in inductive machine learning methods for text mining described in the paper. Being inspired by mining of association rules, we have developed a new categorization method named “Itemsets classifier”. By performing various experiments we have proved its ability to surpass some well-known categorization methods, both in terms of precision/recall and efficiency. As the task of classification is closely related to clustering, we have integrated the principles of Itemsets method into a new document-clustering algorithm as well. We are also presenting other Itemsets classifier applications in unsolicited mail filtering and enhancement of the Naive Bayes classifier. Main ideas and experimental results are presented in the paper.
series ELPUB:2002
email jiri.hynek@insite.cz
content file.pdf (2,158,854 bytes)
discussion No discussions. Post discussion ...
ratings
urn:nbn urn:nbn:se:elpub-02-28
last changed 2003/07/09 15:09
HOMELOGIN (you are user _anon_56547 from group guest) Powered by SciX Open Publishing Services 1.002