UNIGE document Présentation / Intervention
previous document  unige:23173  next document
add to browser collection

MarcXimiL : near duplicates detection

Borel, Alain
Presented at CERN workshop on Innovations in Scholarly Communication (OAI7). Genève - 22-24 June 2011 - . 2011
Abstract MarcXimiL is an open source tool which works on MARCXML records and calculates similarity indices between these records. After a short theoretical introduction, the tutorial will focus on how to install, parametrize and use the tool. This tool can be implemented in order to : * prevent creation of duplicates (similar records are shown during the validation process) * identify duplicates into batch files before ingest * find duplicates inside a collection * suggest to users similar records to the one found after a request * match related documents eg. preprints and articles * and so on. http://marcximil.sourceforge.net
Keywords OAI77MarcximilDeduplicationNear duplicatesDuplicatesDoublonsDédoublonnerDédoublonerDédoblonnageDédoublonageSimilaritéSimilaritySimilitudeCollectionInformation retrieval
Full text
Presentation (1.9 MB) - public document Free access
(ISO format)
KRAUSE, Jan Brice, BOREL, Alain. MarcXimiL : near duplicates detection. In: CERN workshop on Innovations in Scholarly Communication (OAI7). Genève. 2011. https://archive-ouverte.unige.ch/unige:23173

372 hits



Deposited on : 2012-10-04

Export document
Format :
Citation style :