Deduplication of metadata harvested fromOpen Archives Initiative repositories
||Deduplication of metadata harvested fromOpen Archives Initiative repositories
||ELPUB2013. Mining the Digital Information Networks, 17th International Conference on Electronic Publishing 13-14 June 2013, Karlskrona, Sweden.
||Open access (OA) is a way of providing unrestricted access via the Internetto peer-reviewed journal articles as well as theses, monographs and book chapters.Many open access repositories have been created in the last decade. There isalso a number of registry websites that index these repositories. This article analyzesthe repositories indexed by the Open Archives Initiative (OAI) organization interms of record duplication. Based on the sample of 958 metadata files containingrecords modified in 2012 we provide an estimate on the number of duplicates in theentire collection of repositories indexed by OAI. In addition, this work describesseveral open source tools that form a generic workflow suitable for deduplicationof bibliographic records.
||deduplication, record linkage, metadata, Dublin Core, open access,OAI-PMH
||file.pdf (230,564 bytes)
Post discussion ...
These pages are best viewed with any standards compliant browser (e.g. Mozilla).