AI’s Threat to Linguistic Diversity: Dr. Linda Heimisdóttir Warns of Language Extinction Risks in the Era of AI

“Now, some of you might be thinking, well, so what? You just told us that we’ve been living with English-centric language technology for practically as long as computers have been around. So what exactly is different now that we have arrived at the next generation of this technology?”

Those were the questions Dr. Linda Heimisdóttir, CEO of Miðeind, an Icelandic AI company which specializes in building Language Technology, asked during her recent TEDx talk in Iceland’s capital of Reykjavik that capture the core dilemma regarding the impact of AI on linguistic diversity: the co-option of the English language by language technologies is for sure not a new thing, but it was not before the arrival of behemoths like ChatGPT that opened the floodgates for more minor languages.

She suggests approaches that make sure this challenge of linguistic diversity indeed lives on in this era of AI. Heimisdóttir is concerned about it because of the inherently data-hungry nature of Large Language Models — the models require vast volumes of training data so that they do well in performance. She further explained that — in her own words: “When we talk about the amount of data that went into training an LLM like GPT-4, we have to talk terabytes.” One terabyte is equivalent to more than a million novels. That is way more text than we have available for a tiny language like Icelandic.”

This disparity puts smaller languages at a significant disadvantage, making it challenging to justify the immense costs associated with training LLMs for languages with limited digital footprints.

However, Heimisdóttir remains cautiously optimistic, pointing to the promise of “cross-lingual transfer learning,” where LLMs can leverage their knowledge of one language to improve their performance in others.

“What we’ve discovered about these models, actually, is that they seem quite capable of something we refer to as cross-lingual transfer learning,” she said. “What that means is that a model can essentially take what it knows about one language and apply it to a different language, often with remarkable results.”

Recognizing the urgency of the situation, Heimisdóttir’s company, Miðeind, has collaborated with OpenAI to adapt GPT-4 for Icelandic. She acknowledges that “GPT-4 actually performs much better in Icelandic than in probably most other languages, with just a few hundred thousand speakers. But it still has quite a way to go before it can measure up to English.”

“If we fail at this important task of creating an AI that works for all, then I am worried we are headed towards some sort of a disaster and even language death,” said Heimisdóttir eloquently, before adding: “If we succeed, well then I think the future of linguistic diversity is bright.”

Her call to action resonates profoundly, underscoring the need for concerted efforts to integrate smaller languages into LLMs, lest we risk irreversible cultural erosion in the face of technological progress.

AI Insider

Discover the future of AI technology with "AI Insider" - your go-to platform for industry data, market insights, and groundbreaking AI news

Subscribe today for the latest news about the AI landscape