Earlier this week speech synthesis researchers from around the world gathered at Campus Fryslân in Leeuwarden, Netherlands, for something remarkable: the Blizzard Challenge 2025, focused entirely on creating synthetic voices for Bildts, a regional language spoken by roughly 6,000 people in the northern Netherlands.
Why Bildts matters
Bildts emerged from centuries of language contact between Frisian and Dutch varieties from South and North Holland. It's officially recognized as a regional language by Fryslân province, but like many minority languages, it faces pressure from dominant languages. According to recent census data, while 65.6% of residents can speak Bildts, only about 20% can write in it.
The Speech Synthesis Workshop (SSW) chose Bildts deliberately, aligning with this year's theme: "Scaling down: sustainable synthesis for language diversity." Rather than focusing on major world languages that already have extensive AI support, the challenge asked: Can we build high-quality synthetic voices for languages with limited resources?
Read more about the regional impact in this article written in the Leeuwarder Courant (in Dutch).
Technical challenges
Seven international teams tackled two tasks:
Main Task: Build a voice using approximately 7 hours of speech data from Jan de Groot, a speaker from Omrop Fryslân who provided audio columns averaging 3 minutes each.
Zero-shot Task: Create voices from minimal reference audio—just 4-11 seconds per speaker. This tested whether systems could capture both language-specific characteristics and individual speaker traits with extremely limited data.
The teams represented diverse technical approaches:
Four used encoder-decoder architectures
Three employed flow-based or diffusion-based models
Some relied on mel-spectrograms while others used self-supervised learning representations
Approaches to pronunciation varied from phonetic modeling to learning directly from text
Innovative evaluation
Perhaps most importantly, we designed an evaluation framework that respected the language community itself. Rather than relying solely on standard metrics, there were three listener groups:
Bildts speakers (the primary evaluators): Speakers recruited through newspapers, mailing lists, and community outreach
Dutch speakers: To explore how linguistic distance affects perception
International speakers: For baseline quality assessments
The evaluation measured multiple dimensions: how "Bildts-like" the synthesis sounded, naturalness, appropriateness in context, and various quality metrics. Crucially, Bildts speakers could identify specific issues with pronunciation, intonation, and stress—insights impossible to capture without community involvement.
Results and insights
Different teams excelled in different areas, and performance varied significantly between the main task and zero-shot challenge. Some systems produced highly natural-sounding speech but struggled with language-specific characteristics. Others captured Bildts phonetic features well but with less natural prosody.
The analysis showed that building voices for under-resourced languages requires more than just applying existing techniques to new data. It demands understanding the linguistic structure, community needs, and cultural context of the target language.
Looking forward
The Blizzard Challenge began 20 years ago, founded by Keiichi Tokuda and Alan Black, focusing primarily on well-resourced languages. This year's shift toward endangered and minority languages signals a maturation of the field—and a recognition of AI's responsibility to serve linguistic diversity.
The synthetic voices created for Bildts won't just remain academic exercises. Community representatives, including Gerard de Jong (coordinator of multilingualism for municipality Waadhoeke and president of the European Bureau of Minority Languages), participated in evaluations and expressed enthusiasm about potential applications for language education and preservation.
The Speech Synthesis Workshop and Blizzard Challenge 2025 were organized by researchers from the University of Groningen, University of Helsinki, and KTH Stockholm, with support from the Fryske Akademy and local language communities.