MIT Researchers Tackle Problem of LLM Bias

AI, News, Research

Insider Brief

Backed by the U.S. Office of Naval Research, National Science Foundation, and an Alexander von Humboldt Professorship, MIT researchers have developed a theoretical framework explaining why large language models (LLMs) prioritize the beginning and end of input sequences over the middle.
The team found that architectural choices—like causal attention masking and positional encodings—amplify this “position bias,” with additional attention layers compounding the issue, potentially impairing LLM performance in tasks like legal or medical data retrieval.
Using graph-based analysis and controlled experiments, the researchers demonstrated how design and training adjustments could reduce bias, improve accuracy, and guide future LLM development in high-stakes applications.

MIT researchers say they have uncovered why large language models often overlook important information in the middle of long documents—a flaw that could weaken performance in legal, medical, and other high-stakes applications.

Supported by the U.S. Office of Naval Research, the National Science Foundation, and an Alexander von Humboldt Professorship, a team at MIT developed a mathematical framework to explain how “position bias” arises in the design of transformer-based language models. According to MIT, the study found that models such as GPT-4, Claude, and Llama are more likely to emphasize words at the start and end of a document or conversation, while underweighting content in the middle. That tendency stems from design decisions made during model architecture and training.

“These models are black boxes, so as an LLM user, you probably don’t know that position bias can cause your model to be inconsistent,” noted Xinyi Wu, a graduate student in the MIT Institute for Data, Systems, and Society (IDSS) and the Laboratory for Information and Decision Systems (LIDS), and first author of a paper on this research. “You just feed it your documents in whatever order you want and expect it to work. But by understanding the underlying mechanism of these black-box models better, we can improve them by addressing these limitations.”

The researchers examined how two standard components—causal attention masking and positional encodings—influence how models learn relationships between words. Causal masking, often used in language generation, limits each word’s attention to only those that come before it. This creates a directional dependency that favors early tokens, even when the data itself does not.

The effect is further amplified as models add more attention layers, researchers said. Each layer reuses and reinforces prior attention weights, making early input more dominant in the model’s reasoning. The team also showed that positional encodings, which help models keep track of word order, can help reduce this bias if tuned correctly—though their effectiveness weakens as model complexity increases.

To validate the theory, the MIT team built a graph-based representation of information flow across model layers, then tested retrieval tasks with key phrases inserted at various points in a document. Their results showed a U-shaped accuracy curve: models performed best when the target phrase appeared at the beginning or end, and worst when it appeared in the middle.

This “lost-in-the-middle” behavior has wide-reaching implications for how AI systems handle long documents or conversations. In law or medicine, where critical details may appear anywhere in a record, position bias can lead to incomplete or misleading results, according to the researchers.

The research, co-authored by Wu, postdoc Yifei Wang, and professors Stefanie Jegelka and Ali Jadbabaie, suggests several remedies. These include alternative masking strategies, simplified attention networks, or training methods that rebalance model sensitivity to all positions. The team emphasized that training data can also introduce or worsen bias, and should be carefully curated.

While the study does not eliminate position bias, it offers rare clarity into how complex AI models function internally. The team plans to explore how different positional encoding strategies influence model behavior and whether certain tasks could benefit from bias deliberately.

Jadbabaie, professor and head of the Department of Civil and Environmental Engineering, a core faculty member of IDSS, and a principal investigator in LIDS, pointed out, “By doing a combination of theory and experiments, we were able to look at the consequences of model design choices that weren’t clear at the time. If you want to use a model in high-stakes applications, you must know when it will work, when it won’t, and why.”

Amin Saberi, professor and director of the Stanford University Center for Computational Market Design, who was not involved with this work, added, “These researchers offer a rare theoretical lens into the attention mechanism at the heart of the transformer model. They provide a compelling analysis that clarifies longstanding quirks in transformer behavior, showing that attention mechanisms, especially with causal masks, inherently bias models toward the beginning of sequences.”

The findings will be presented at the upcoming International Conference on Machine Learning.