Insider Brief
- AI hallucinations arise from knowledge overshadowing, where frequently encountered information suppresses less common knowledge, leading to factual distortions.
- Researchers have developed a “log-linear law” that predicts hallucination rates based on knowledge popularity, length, and model size, offering a framework for early detection.
- A new decoding method, CoDA, significantly reduces hallucinations without retraining, suggesting a path toward more reliable AI-generated content.
Large language models generate false statements even when trained on factual data. A new study suggests these errors stem from “knowledge overshadowing,” a phenomenon in which dominant knowledge suppresses less popular facts, distorting AI-generated responses.
AI hallucinations — instances where models produce incorrect or fabricated information — pose significant risks in applications that demand accuracy, such as medical diagnosis, legal research, and scientific discovery. While previous research has attributed these hallucinations to poor training data or model biases, a new study by researchers from the University of Illinois Urbana-Champaign, Columbia University, Northwestern University and Stanford University identifies a deeper, structural cause.
The researchers, who who published their findings on arXiv, introduce the concept of “knowledge overshadowing,” which occurs when more frequently encountered information in a dataset overpowers less common knowledge. This suppression leads to factual errors in model-generated outputs. The study outlines a mathematical framework—the “log-linear law of knowledge overshadowing”—which predicts that hallucination rates increase in proportion to the logarithm of three factors: knowledge popularity, knowledge length, and model size.
What Are AI Hallucinations?
AI hallucinations arise when language models generate statements that sound plausible but are factually incorrect. Unlike simple misinformation, these errors are often subtle, mixing real knowledge with distortions. The study provides an example: when asked to name a famous singer in North Korea, an AI model incorrectly suggests “Kim Jong Un,” conflating widely known information (the North Korean leader’s name) with an unrelated category (famous singers).
Previous studies have linked hallucinations to data quality issues, inadequate fine-tuning, or inherent biases in how AI models weigh different pieces of information. However, the new research demonstrates that hallucinations persist even when the training data is strictly factual. This suggests that the problem lies not in what the models learn but in how they prioritize and retrieve information.
Understanding Knowledge Overshadowing
The study finds that knowledge overshadowing is a major driver of hallucinations. When a model is trained on a dataset where one piece of information appears more frequently than another, the more common knowledge suppresses the less common knowledge, leading the model to make incorrect assumptions.
The researchers discovered that the likelihood of a hallucination follows a predictable pattern. Their “log-linear law” shows that as the frequency of dominant knowledge increases, the probability of overshadowing—and thus hallucination—rises proportionally to the logarithm of that frequency. A similar effect occurs when knowledge length (the number of words in a fact) increases or when the model size grows.
This insight has important implications for large AI models. As models scale up, their ability to generalize improves, but their tendency to hallucinate also increases because they compress and simplify knowledge representations. This compression causes less frequent facts to be absorbed into dominant knowledge structures, increasing the risk of factual distortions.
Can We Predict and Prevent Hallucinations?
A key contribution of the study is its ability to predict hallucinations before they occur. By applying the log-linear law, researchers can estimate when a model is likely to hallucinate based on the characteristics of its training data. This predictive capability provides AI developers with a tool to diagnose and address hallucination risks before deploying models in real-world settings.
To mitigate hallucinations, the researchers propose a new method called “Contrastive Decoding to Amplify Overshadowed Knowledge” (CoDA). This technique works by identifying overshadowed knowledge and boosting its influence during text generation. Rather than retraining the model with new data, CoDA adjusts the model’s decoding process to balance dominant and less dominant knowledge sources.
Experiments with CoDA show significant improvements in factual accuracy. When tested on datasets designed to assess AI factuality, CoDA reduced hallucination rates by 27.9% on the Overshadow dataset, 13.1% on MemoTrap, and 18.3% on NQ-Swap—three benchmarks used to measure AI-generated misinformation.
Implications for AI Development
The findings suggest a fundamental shift in how AI developers should approach hallucinations. Instead of treating them as mere data-quality issues, developers should recognize that hallucinations stem from the structure of knowledge within AI models. Understanding knowledge overshadowing allows for more precise interventions, such as adjusting training data distributions or using methods like CoDA to counteract biases.
The study also challenges the assumption that bigger AI models are always better. While increasing model size generally improves performance, it also exacerbates hallucinations due to greater compression of information. This means that future AI development must balance model size with strategies to manage knowledge overshadowing.
Limitations and Future Work
While the study offers new insights, it also acknowledges limitations. The researchers were unable to analyze the training data of proprietary models like OpenAI’s GPT-4, making it difficult to directly validate their findings on state-of-the-art commercial AI systems. Additionally, quantifying real-world knowledge distributions remains a challenge, as natural language data is inherently noisy and imprecise.
It’s important to note that the researchers published their findings on arXiv, which is a pre-print server. Online pre-print servers help researchers gain fast feedback from their colleagues, especially in fast-moving fields, like artificial intelligence. However, this has not been officially peer-reviewed yet.
The Road Ahead
Future work could explore how knowledge overshadowing interacts with other AI mechanisms, such as reinforcement learning with human feedback (RLHF), which is commonly used to fine-tune models. Researchers also plan to refine methods like CoDA to work more effectively with larger models and real-world datasets.
As AI systems become more deeply integrated into industries that rely on accurate information, addressing hallucinations will be critical. The study’s identification of knowledge overshadowing as a primary cause — and its development of predictive and corrective measures — represents a step toward making AI-generated content more reliable.
Researchers and Affiliations
The study was conducted by Yuji Zhang, Sha Li, Cheng Qian, Jiateng Liu, Pengfei Yu, Chi Han, Yi R. Fung, Chengxiang Zhai, and Heng Ji from the University of Illinois Urbana-Champaign; Kathleen McKeown from Columbia University; Manling Li from both Northwestern University and Stanford University.