IMPROVING INFORMATION RETRIEVAL IN DIGITAL THESES USING METADATA
||Abascal, Rocio; Rumpler, Beatrice; Pinon, Jean-Marie
||IMPROVING INFORMATION RETRIEVAL IN DIGITAL THESES USING METADATA
||elpub2002 - Technology Interactions. Proceedings of the 6th International ICCC/IFIP Conference on Electronic Publishing held in Karlovy Vary, Czech Republic, 6–8 November 2002. Editors: Carvalho, Joao Álvaro; Hübler, Arved; Baptista, Ana Alice. Publisher: VWF Berlin, 2002. ISBN 3-89700-357-0. 395 pages.
||In this paper, we present an approach to improve information retrieval in digital scientific theses. This approach consists in defining and
using “metadata” to help the users to find relevant information during search sessions. This research is one part of the CITHER project (Consultation en Texte Integral des THEses En Reseau) developed by the INSA of Lyon (France). CITHER concerns the online publishing of the INSA’s scientific theses. In a first step, CITHER has permitted the distribution of the theses in PDF format (Portable Document Format), via a server of documents. However, this system does not permit to select only the pertinent parts of the theses during a search session. It is necessary to read the entire document to find them.
In the first part of this paper, we present the initial structure of a thesis stored in the CITHER’s server. Then, we describe our method to define a new structure of document based on “semantic metadata”. We propose to introduce the concept of ontology to define the semantic “metadata” and to use XML (eXtended Markup Language) to structure the document. To define the semantic “metadata”, we extract the main concepts such as “model”, “theorem”, “method”, “tool”, etc. found in almost all the theses. We formalize these concepts according to an ontology. Therefore, the new model of document includes “classical metadata” (Dublin Core’s, etc) and “semantic metadata”, which are both included in the document by the way of XML tags. This model of document is based on a logical and semantic structure. The information retrieval, in our new format of theses, takes place by the use of XML tags.
In the second part of this paper, we will present our prototype and some examples to illustrate our proposition and the results. We will finish this
paper with our conclusion and the future propositions to improve the prototype.
||file.pdf (2,236,567 bytes)
Post discussion ...
These pages are best viewed with any standards compliant browser (e.g. Mozilla).