en
Proceedings chapter
Open access
English

Improving Standard German Captioning of Spoken Swiss German: Evaluating Multilingual Pre-trained Models

Presented at Macau SAR, September 4-8 2023
Publication date2023-09-04
Abstract

Multilingual pre-trained language models are often the best alternative in low-resource settings. In the context of a cascade architecture for automatic Standard German captioning of spoken Swiss German, we evaluate different models on the task of transforming Swiss German ASR output into Standard German. Instead of training a large model from scratch, we fine-tuned publicly available pre-trained models, which reduces the cost of training high-quality neural machine translation models. Results show that pre-trained multilingual models achieve the highest scores, and that a higher number of languages included in pre-training improves the performance. We also observed that the type of source and target included in fine-tuning data impacts the results.

eng
Keywords
  • Low-resource machine translation
  • Swiss german
  • Standard german
  • Automatic subtitling
  • Automatic captioning
  • End-user evaluation
  • Television
  • Language models
Research group
Citation (ISO format)
MUTAL, Jonathan David et al. Improving Standard German Captioning of Spoken Swiss German: Evaluating Multilingual Pre-trained Models. In: Proceedings of Machine Translation Summit XIX Vol. 2: Users Track. Macau SAR. [s.l.] : [s.n.], 2023. p. 65–76.
Main files (1)
Proceedings chapter (Published version)
Secondary files (1)
Identifiers
  • PID : unige:171243
88views
34downloads

Technical informations

Creation09/08/2023 3:04:25 PM
First validation09/11/2023 8:03:37 AM
Update time09/11/2023 8:03:37 AM
Status update09/11/2023 8:03:37 AM
Last indexation05/06/2024 4:57:55 PM
All rights reserved by Archive ouverte UNIGE and the University of GenevaunigeBlack