Proceedings chapter
OA Policy
English

Towards High-Quality LLM-Based Data for French Spontaneous Speech Simplification: an Exo-Refinement Approach

Presented atRotterdam, The Netherlands, 17-21 August 2025
Published inInterspeech, p. 5
First online date2025-08-21
Abstract

This study explores the synthetic data generation capabilities of LLMs for French spontaneous speech simplification (S3), a low-resource NLP task. We introduce the exo-refinement approach, which builds on the self-reflect workflow but differs by using separate models for generation and evaluation. To address the limitations of single-model refinement, it integrates external feedback from distinct LLMs as judges, refining outputs based on 3 task-specific dimensions. Comparing expert-based simplifications from gold transcriptions to LLM synthetic outputs, results show that mistral-large outperforms all benchmarked models, including the MUSS baseline, and mistral-small achieves competitive performance with few refinements. SARI results confirm that iterations improve simplicity gain without compromising semantic meaning, as shown by COMET scores. These findings support exo-refinement as a scalable method for synthetic data generation and future S3 model development.

Keywords
  • Speech simplification
  • Synthetic data generation
  • Large language models
Research groups
Citation (ISO format)
ORMAECHEA GRIJALBA, Lucía et al. Towards High-Quality LLM-Based Data for French Spontaneous Speech Simplification: an Exo-Refinement Approach. In: Interspeech. Rotterdam, The Netherlands. [s.l.] : [s.n.], 2025. p. 5. doi: 10.21437/Interspeech.2025-452
Main files (1)
Proceedings chapter (Accepted version)
accessLevelPublic
Identifiers
22views
39downloads

Technical informations

Creation29/08/2025 14:31:32
First validation01/09/2025 07:57:28
Update time01/09/2025 07:57:28
Status update01/09/2025 07:57:28
Last indexation24/09/2025 22:06:39
All rights reserved by Archive ouverte UNIGE and the University of GenevaunigeBlack