Elpub : Digital Library : Works

Paper 02-31:
IMPROVING INFORMATION RETRIEVAL IN DIGITAL THESES USING METADATA

id 02-31
authors Abascal, Rocio; Rumpler, Beatrice; Pinon, Jean-Marie
year 2002
title IMPROVING INFORMATION RETRIEVAL IN DIGITAL THESES USING METADATA
source elpub2002 - Technology Interactions. Proceedings of the 6th International ICCC/IFIP Conference on Electronic Publishing held in Karlovy Vary, Czech Republic, 6–8 November 2002. Editors: Carvalho, Joao Álvaro; Hübler, Arved; Baptista, Ana Alice. Publisher: VWF Berlin, 2002. ISBN 3-89700-357-0. 395 pages.
summary In this paper, we present an approach to improve information retrieval in digital scientific theses. This approach consists in defining and using “metadata” to help the users to find relevant information during search sessions. This research is one part of the CITHER project (Consultation en Texte Integral des THEses En Reseau) developed by the INSA of Lyon (France). CITHER concerns the online publishing of the INSA’s scientific theses. In a first step, CITHER has permitted the distribution of the theses in PDF format (Portable Document Format), via a server of documents. However, this system does not permit to select only the pertinent parts of the theses during a search session. It is necessary to read the entire document to find them. In the first part of this paper, we present the initial structure of a thesis stored in the CITHER’s server. Then, we describe our method to define a new structure of document based on “semantic metadata”. We propose to introduce the concept of ontology to define the semantic “metadata” and to use XML (eXtended Markup Language) to structure the document. To define the semantic “metadata”, we extract the main concepts such as “model”, “theorem”, “method”, “tool”, etc. found in almost all the theses. We formalize these concepts according to an ontology. Therefore, the new model of document includes “classical metadata” (Dublin Core’s, etc) and “semantic metadata”, which are both included in the document by the way of XML tags. This model of document is based on a logical and semantic structure. The information retrieval, in our new format of theses, takes place by the use of XML tags. In the second part of this paper, we will present our prototype and some examples to illustrate our proposition and the results. We will finish this paper with our conclusion and the future propositions to improve the prototype.
series ELPUB:2002
email rabascal@lisi.insa-lyon.fr
content file.pdf (2,236,567 bytes)
discussion No discussions. Post discussion ...
ratings Ratings: 2 5
urn:nbn urn:nbn:se:elpub-02-31
last changed 2003/07/09 15:18
HOMELOGIN (you are user _anon_253305 from group guest) Powered by SciX Open Publishing Services 1.002