Elpub : Digital Library : Works

Paper 106_elpub2013:
Deduplication of metadata harvested fromOpen Archives Initiative repositories

id 106_elpub2013
authors Wendykier, Piotr
year 2013
title Deduplication of metadata harvested fromOpen Archives Initiative repositories
source ELPUB2013. Mining the Digital Information Networks, 17th International Conference on Electronic Publishing 13-14 June 2013, Karlskrona, Sweden.
summary Open access (OA) is a way of providing unrestricted access via the Internetto peer-reviewed journal articles as well as theses, monographs and book chapters.Many open access repositories have been created in the last decade. There isalso a number of registry websites that index these repositories. This article analyzesthe repositories indexed by the Open Archives Initiative (OAI) organization interms of record duplication. Based on the sample of 958 metadata files containingrecords modified in 2012 we provide an estimate on the number of duplicates in theentire collection of repositories indexed by OAI. In addition, this work describesseveral open source tools that form a generic workflow suitable for deduplicationof bibliographic records.
keywords deduplication, record linkage, metadata, Dublin Core, open access,OAI-PMH
series ELPUB:2013
type normal paper
content file.pdf (230,564 bytes)
discussion No discussions. Post discussion ...
urn:nbn urn:nbn:se:elpub-106_elpub2013
last changed 2013/05/28 17:44
HOMELOGIN (you are user _anon_293310 from group guest) Powered by SciX Open Publishing Services 1.002