Scientific article
OA Policy
English

MedFrenchmark, a small set for benchmarking generative LLMs in medical french

Published inStudies in health technology and informatics, vol. 316, p. 601-605
Publication date2024-08-22
Abstract

Generative Large Language Models (LLMs) have become ubiquitous in various fields, including healthcare and medicine. Consequently, there is growing interest in leveraging LLMs for medical applications, leading to the emergence of novel models daily. However, evaluation and benchmarking frameworks for LLMs are scarce, particularly those tailored for medical French. To address this gap, we introduce a minimal benchmark consisting of 114 open questions designed to assess the medical capabilities of LLMs in French. The proposed benchmark encompasses a wide range of medical domains, reflecting real-world clinical scenarios' complexity. A preliminary validation involved testing seven widely used LLMs with a parameter size of 7 billion. Results revealed significant variability in performance, emphasizing the importance of rigorous evaluation before deploying LLMs in medical settings. In conclusion, we present a novel and valuable resource for rapidly evaluating LLMs in medical French. By promoting greater accountability and standardization, this benchmark has the potential to enhance trustworthiness and utility in harnessing LLMs for medical applications.

Keywords
  • Generative AI
  • LLM
  • NLP
  • Benchmark
  • Benchmarking
  • France
  • Computer Simulation
Citation (ISO format)
QUERCIA, Amandine Alizée et al. MedFrenchmark, a small set for benchmarking generative LLMs in medical french. In: Studies in health technology and informatics, 2024, vol. 316, p. 601–605. doi: 10.3233/SHTI240486
Main files (1)
Article (Published version)
Identifiers
ISSN of the journal0926-9630
32views
1downloads

Technical informations

Creation02/09/2024 08:48:48
First validation23/09/2024 08:48:54
Update time23/09/2024 08:48:54
Status update23/09/2024 08:48:54
Last indexation05/10/2024 20:17:28
All rights reserved by Archive ouverte UNIGE and the University of GenevaunigeBlack