Using the LARA Little Prince to compare human and TTS audio quality

Akhlaghi, Elham; Auðunardóttir, Ingibjörg; Bączkowska, Anna; Bédi, Branislav; Beedar, Hakeem; Berthelsen, Harald; Chua, Cathy; Cucchiarini, Catia; Habibi, Hanieh; Horváthová, Ivana; Ikeda, Junta; Maizonniaux, Christèle; Chiaráin, Neasa Ní; Raheb, Chadi; Rayner, Emmanuel; Sloan, John; Tsourakis, Nikolaos; Yao, Chunlin

Proceedings chapter

English

Using the LARA Little Prince to compare human and TTS audio quality

ContributorsAkhlaghi, Elham; Auðunardóttir, Ingibjörg; Bączkowska, Anna; Bédi, Branislav; Beedar, Hakeem; Berthelsen, Harald; Chua, Cathy; Cucchiarini, Catia; Habibi, Hanieh; Horváthová, Ivana; Ikeda, Junta; Maizonniaux, Christèle; Chiaráin, Neasa Ní; Raheb, Chadi; Rayner, Emmanuel; Sloan, John; Tsourakis, Nikolaos; Yao, Chunlin

Presented atMarseille, 20-25 juin 2022

Published inNicoletta Calzolari, Frédéric Béchet, Philippe Blache, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Hélène Mazo, Jan Odijk, Stelios Piperidis (Ed.), Proceedings of the 13th Conference on Language Resources and Evaluation (LREC 2022), p. 2967-2975

PublisherMarseille, France : European Language Resources Association (ELRA)

Publication date2022

Abstract

A popular idea in Computer Assisted Language Learning (CALL) is to use multimodal annotated texts, with annotations typically including embedded audio and translations, to support L2 learning through reading. An important question is how to create good quality audio, which can be done either through human recording or by a Text-To-Speech (TTS) engine. We may reasonably expect TTS to be quicker and easier, but human to be of higher quality. Here, we report a study using the open source LARA platform and ten languages. Samples of audio totalling about five minutes, representing the same four passages taken from LARA versions of Saint-Exupèry’s “Le petit prince”, were provided for each language in both human and TTS form; the passages were chosen to instantiate the 2x2 cross product of the conditions dialogue, not-dialogue and humour, not-humour. 251 subjects used a web form to compare human and TTS versions of each item and rate the voices as a whole. For the three languages where TTS did best, English, French and Irish, the evidence from this study and the previous one it extended suggest that TTS audio is now pedagogically adequate and roughly comparable with a non-professional human voice in terms of exemplifying correct pronunciation and prosody. It was however still judged substantially less natural and less pleasant to listen to. No clear evidence was found to support the hypothesis that dialogue and humour pose special problems for TTS. All data and software will be made freely available.

Keywords

TTS
Evaluation
Multimodality
Reading
Emotion

Affiliation entities

Faculté de traduction et d'interprétation / Département de traitement informatique multilingue

Research groups

TIM/ISSCO

Citation (ISO format)

AKHLAGHI, Elham et al. Using the LARA Little Prince to compare human and TTS audio quality. In: Proceedings of the 13th Conference on Language Resources and Evaluation (LREC 2022). Nicoletta Calzolari, Frédéric Béchet, Philippe Blache, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Hélène Mazo, Jan Odijk, Stelios Piperidis (Ed.). Marseille. Marseille, France : European Language Resources Association (ELRA), 2022. p. 2967–2975.

Proceedings chapter (Published version)

CC BY-NC-4.0

Identifiers

PID : unige:164899

Additional URL for this publicationhttp://www.lrec-conf.org/proceedings/lrec2022/pdf/2022.lrec-1.318.pdf

ISBN9791095546726

177views

205downloads

Creation06/11/2022 17:08:00

First validation06/11/2022 17:08:00

Update16/03/2023 08:46:26

Status update16/03/2023 08:46:24

Last indexation01/11/2024 03:21:36

Archive ouverte UNIGE

Using the LARA Little Prince to compare human and TTS audio quality

Technical informations