en
Report
Open access
English

Development of a flexible tool for the automatic comparison of bibliographic records. Application to sample collections

Number of pages107
Publication date2009
Abstract

Due to the multiplication of digital bibliographic catalogues (open repositories, library and bookseller catalogues), information specialists are facing the challenge of mass-processing huge amounts of metadata for various purposes. Among the many possible applications, determining the similarity between records is an important issue. Such a similarity can be interesting from a bibliographic point of view (i.e., do the records describe the same document, the answer to which can be useful for deduplication or for collection overlap studies) as well as from a thematic point of view (suggestion of documents to the user, as well as content management within the framework of a library policy, automatic classification of documents, and so on). In order to fulfil such various needs, we propose a flexible, open-source, multiplatform software tool supporting the implementation of multiple strategies for record comparisons. In a second step, we study the relevance and performance of several algorithms applied to a selection of collections (size, origin, document types...).

Keywords
  • Marcximil
  • Deduplication
  • Near duplicates
  • Duplicates
  • Doublons
  • Dédoublonner
  • Dédoublonage
  • Similarité
  • Similarity
  • Similitude
  • Collection
  • Information retrieval
Citation (ISO format)
BOREL, Alain, KRAUSE, Jan Brice. Development of a flexible tool for the automatic comparison of bibliographic records. Application to sample collections. 2009
Main files (1)
Report
accessLevelPublic
Identifiers
  • PID : unige:23174
604views
361downloads

Technical informations

Creation2012/09/27 09:05:00
First validation2012/09/27 09:05:00
Update time2023/03/14 17:41:58
Status update2023/03/14 17:41:57
Last indexation2024/01/16 00:18:28
All rights reserved by Archive ouverte UNIGE and the University of GenevaunigeBlack