Proceedings chapter
OA Policy
English

Improving Standard German Captioning of Spoken Swiss German: Evaluating Multilingual Pre-trained Models

Presented atMacau SAR, September 4-8 2023
Publication date2023-09-04
Abstract

Multilingual pre-trained language models are often the best alternative in low-resource settings. In the context of a cascade architecture for automatic Standard German captioning of spoken Swiss German, we evaluate different models on the task of transforming Swiss German ASR output into Standard German. Instead of training a large model from scratch, we fine-tuned publicly available pre-trained models, which reduces the cost of training high-quality neural machine translation models. Results show that pre-trained multilingual models achieve the highest scores, and that a higher number of languages included in pre-training improves the performance. We also observed that the type of source and target included in fine-tuning data impacts the results.

Keywords
  • Low-resource machine translation
  • Swiss german
  • Standard german
  • Automatic subtitling
  • Automatic captioning
  • End-user evaluation
  • Television
  • Language models
Research groups
Citation (ISO format)
MUTAL, Jonathan David et al. Improving Standard German Captioning of Spoken Swiss German: Evaluating Multilingual Pre-trained Models. In: Proceedings of Machine Translation Summit XIX Vol. 2: Users Track. Macau SAR. [s.l.] : [s.n.], 2023. p. 65–76.
Main files (1)
Proceedings chapter (Published version)
Secondary files (1)
Supplemental data - Presentation Slides
Identifiers
  • PID : unige:171243
185views
401downloads

Technical informations

Creation08/09/2023 15:04:25
First validation11/09/2023 08:03:37
Update time11/09/2023 08:03:37
Status update11/09/2023 08:03:37
Last indexation25/09/2025 22:07:33
All rights reserved by Archive ouverte UNIGE and the University of GenevaunigeBlack