USE OF TEXT MINING METHODS IN A DIGITAL LIBRARY
||Hynek, Jiri; Jezek, Karel
||USE OF TEXT MINING METHODS IN A DIGITAL LIBRARY
||elpub2002 - Technology Interactions. Proceedings of the 6th International ICCC/IFIP Conference on Electronic Publishing held in Karlovy Vary, Czech Republic, 6–8 November 2002. Editors: Carvalho, Joao Álvaro; Hübler, Arved; Baptista, Ana Alice. Publisher: VWF Berlin, 2002. ISBN 3-89700-357-0. 395 pages.
||The article deals with use of Itemsets classifier based on inductive machine learning in the context of digital library environment. We
provide a brief description of a real-world digital library implemented at a power utility. Its implementation and operating experience have motivated our research in inductive machine learning methods for text mining described in the paper.
Being inspired by mining of association rules, we have developed a new categorization method named “Itemsets classifier”. By performing various
experiments we have proved its ability to surpass some well-known categorization methods, both in terms of precision/recall and efficiency. As the
task of classification is closely related to clustering, we have integrated the principles of Itemsets method into a new document-clustering algorithm as well. We are also presenting other Itemsets classifier applications in unsolicited
mail filtering and enhancement of the Naive Bayes classifier. Main ideas and experimental results are presented in the paper.
||file.pdf (2,158,854 bytes)
Post discussion ...
These pages are best viewed with any standards compliant browser (e.g. Mozilla).