en
Proceedings chapter
Open access
English

Named entity recognition in chemical patents using ensemble of contextual language models

Published inProceedings of CLEF (Conference and Labs of the Evaluation Forum) 2020 Working Notes, Editors Cappellato L., Eickhoff C., Ferro N., Névéol A.
Presented at Thessaloniki, Greece, September 22-25, 2020
Collection
  • CEUR Workshop Proceedings; 2696
Publication date2020
Abstract

Chemical patent documents describe a broad range of applications holding key reaction and compound information, such as chemical structure, reaction formulas, and molecular properties. These informational entities should be first identified in text passages to be utilized in downstream tasks. Text mining provides means to extract relevant information from chemical patents through information extraction techniques. As part of the Information Extraction task of the Cheminformatics Elsevier Melbourne University challenge, in this work we study the effectiveness of contextualized language models to extract reaction information in chemical patents. We assess transformer architectures trained on a generic and specialised corpora to propose a new ensemble model. Our best model, based on a majority ensemble approach, achieves an exact F1-score of 92.30% and a relaxed F1-score of 96.24%. The results show that ensemble of contextualized language models can provide an effective method to extract information from chemical patents.

eng
Keywords
  • Named-entity recognition
  • Chemical patents
  • Contextual language models
  • Patent text mining
  • Information extraction
Citation (ISO format)
COPARA ZEA, Jenny Linet et al. Named entity recognition in chemical patents using ensemble of contextual language models. In: Proceedings of CLEF (Conference and Labs of the Evaluation Forum) 2020 Working Notes. Thessaloniki, Greece. [s.l.] : [s.n.], 2020. (CEUR Workshop Proceedings)
Main files (1)
Proceedings chapter (Published version)
Identifiers
  • PID : unige:159578
98views
37downloads

Technical informations

Creation02/03/2022 11:16:00 AM
First validation02/03/2022 11:16:00 AM
Update time03/16/2023 2:52:06 AM
Status update03/16/2023 2:52:05 AM
Last indexation05/06/2024 10:23:24 AM
All rights reserved by Archive ouverte UNIGE and the University of GenevaunigeBlack