Proceedings chapter

Open access

English

Improving Standard German Captioning of Spoken Swiss German: Evaluating Multilingual Pre-trained Models

ContributorsMutal, Jonathan David; Bouillon, Pierrette

; Gerlach, Johanna

; Starlander, Marianne

Published inProceedings of Machine Translation Summit XIX Vol. 2: Users Track, p. 65-76

Presented at Macau SAR, September 4-8 2023

Publication date2023-09-04

Abstract

Multilingual pre-trained language models are often the best alternative in low-resource settings. In the context of a cascade architecture for automatic Standard German captioning of spoken Swiss German, we evaluate different models on the task of transforming Swiss German ASR output into Standard German. Instead of training a large model from scratch, we fine-tuned publicly available pre-trained models, which reduces the cost of training high-quality neural machine translation models. Results show that pre-trained multilingual models achieve the highest scores, and that a higher number of languages included in pre-training improves the performance. We also observed that the type of source and target included in fine-tuning data impacts the results.

Keywords

Low-resource machine translation
Swiss german
Standard german
Automatic subtitling
Automatic captioning
End-user evaluation
Television
Language models

Affiliation

Faculté de traduction et d'interprétation / Département de traitement informatique multilingue

Research group

TIM/ISSCO

Citation (ISO format)

MUTAL, Jonathan David et al. Improving Standard German Captioning of Spoken Swiss German: Evaluating Multilingual Pre-trained Models. In: Proceedings of Machine Translation Summit XIX Vol. 2: Users Track. Macau SAR. [s.l.] : [s.n.], 2023. p. 65–76.

Proceedings chapter (Published version)

accessLevelPublic

Identifiers

PID : unige:171243

88views

34downloads

Creation09/08/2023 3:04:25 PM

First validation09/11/2023 8:03:37 AM

Update time09/11/2023 8:03:37 AM

Status update09/11/2023 8:03:37 AM

Last indexation05/06/2024 4:57:55 PM