Transcribing Diverse Voices: Using Whisper for ICE corpora
Odette Scharenborg (Hrsg). Proceedings of Interspeech 2025. https://www.isca-archive.org/index.html: ISCA Archive 2025 S. 3359 - 3363
Erscheinungsjahr: 2025
Publikationstyp: Diverses (Konferenzbeitrag)
Sprache: Englisch
Doi/URN: 10.21437/Interspeech.2025-1980
Inhaltszusammenfassung
The precise transcription of speech data is crucial yet work-intensive in the field of sociolinguistics. Although recent advancements in end-to-end ASR (e.g. Whisper) offer great potential across various disciplines, these models have rarely been tested for sociolinguistic corpus transcription. This study addresses this gap by harnessing all Whisper models for the re-transcription of classic sociolinguistic reference corpora of non-standard varieties: ICE Nigeria and ICE Scotland. Employing W...The precise transcription of speech data is crucial yet work-intensive in the field of sociolinguistics. Although recent advancements in end-to-end ASR (e.g. Whisper) offer great potential across various disciplines, these models have rarely been tested for sociolinguistic corpus transcription. This study addresses this gap by harnessing all Whisper models for the re-transcription of classic sociolinguistic reference corpora of non-standard varieties: ICE Nigeria and ICE Scotland. Employing WER metrics, the study utilizes linear mixed-effects modelling to determine significant factors affecting transcription accuracy. The results show that Whisper can manage both varieties, though it is slightly less accurate for Nigerian English. An increased model size reduces WER and boosts robustness, though accuracy varies by sound file. While Whisper proves useful for corpus transcription work overall, challenges such as speaker diarization, hallucinations and idealized transcriptions persist.» weiterlesen» einklappen