Elpub : Digital Library : Works

Paper 0020:
FROM OPEN SOURCE TO OPEN INFORMATION: COLLABORATIVE METHODS IN CREATING XML-BASED MARKUP LANGUAGES

id 0020
authors Rehm, Georg; Henning Lobin
year 2000
title FROM OPEN SOURCE TO OPEN INFORMATION: COLLABORATIVE METHODS IN CREATING XML-BASED MARKUP LANGUAGES
source Electronic Publishing 2000. Electronic Publishing in the Third Millenium: Proceedings of an ICCC/IFIP conference held at Kaliningrad/Svetlogorsk, Russia August 17-19 2000/Edited by Peter Linde, John W.T.Smith, Elena Emilianova. Washington D.C.: ICCC Press, 2000. 239 p. ISBN: 1-891365-07-X
summary Until the beginning of the last decade, the Internet was primarily used by scientific, educational, and military organisations for the exchange of information such as data files and electronic mail. The introduction of the easy-to-use hypertext system World Wide Web (WWW) has, however, begun a new era of the world-spanning computer network.

In this paper we examine a part of the Information Marketplace (Dertouzos, 1997) that will give users of the WWW a wide range of new possibilities for gathering information, a task that is predominantly carried out using index-based (e.g., www.google.com, www.metacrawler.com) or catalogue-based search engines like www.yahoo.com, for example. One of the major shortcomings of search engines is the lack of semantic certainty that results from both the absence of structure in the indexed documents as well as insufficient methods of information extraction and information retrieval regarding a generalized conceptual level (vs. the statistics-based word level that is still the most common method in search engine technology). For this reason, the user of a search engine is very often confronted with lots of documents that are beyond the scope of his or her search query.

The aforementioned lack of explicit structure in web documents will be overcome in the next few years by an augmented use of XML (Extensible Markup Language, Bray et al., 1998) and a simultaneous turning away from HTML (Hypertext Markup Language, Raggett et al., 1997) that only allows an annotation of rather coarse textual elements. However, this new structural variety and liberty of XML bears certain dangers: As XML allows a free definition of concrete markup languages like HTML, a lot of proprietary XML-based annotation schemata could emerge that, in turn, make the process of automatic information extraction by search engines not easier but even more difficult, as a large part of the Internet's and especially the World Wide Web's success is based on the standardization of concrete markup languages.

In this paper, we outline a possible development that may counteract this XML babel. The main impetus for our prognosis is a paradigm in software development which has been successful for almost 20 years now. This paradigm, called Open Source (DiBona et al., 1999; Raymond, 1999), made possible, among other software packages, the free operating system Linux. The impetus behind Open Source will give new and decisive impulses for the use of quasi standardized XML-based markup languages and concrete schemata for related standards. These impulses will result in what we want to call Open Information.

keywords New publishing models, technical factors, SGML, XML
series ELPUB:2000
type full paper
email georg.rehm@germanistik.uni-giessen.de
content file.pdf (191,332 bytes)
discussion No discussions. Post discussion ...
ratings
urn:nbn urn:nbn:se:elpub-0020
last changed 2008/07/04 07:19
HOMELOGIN (you are user _anon_619673 from group guest) Powered by SciX Open Publishing Services 1.002