Language Model Shows Signs of Learning Biology, Study Finds

Insider Brief

  • Anthropic researchers used attribution graphs to reveal that a small language model trained on internet text developed internal representations of biological structures like base pairs, codons, and amino acids.
  • The model exhibited modular, interpretable circuits corresponding to genetic functions, despite having no explicit training in biology.
  • The study introduces attribution graphs as a method to reverse engineer internal model behavior and highlights both the promise and limitations of applying interpretability techniques to scientific domains.

Language models trained on internet text appear to develop internal strategies for identifying genetic structures such as amino acids, base pairs, and codons, according to a new study by Anthropic researchers.

In a paper published this week, researchers from Anthropic’s Transformer Circuits team introduced a method they call attribution graphs, which maps how large language models internally represent complex biological sequences, despite never being explicitly trained in biology.

Working with a slimmed-down version of OpenAI’s GPT-2 model, the team probed how the AI handled raw DNA sequences. What they found was that the model appeared to form modular, interpretable components corresponding to basic genetic structures—similar to how a trained biologist might mentally break down a DNA strand.

“Our goal is to reverse engineer biology from weights, not prompts,” the authors wrote, referring to the idea that meaning can be extracted directly from the model’s inner mechanics, rather than from analyzing its output in response to a prompt.

The study, Attribution Graphs for Biology, marks a continuation of Anthropic’s work in “mechanistic interpretability,” a growing field that aims to explain how language models process information by breaking down their internal logic, rather than treating them as inscrutable black boxes.

Modular Strategies Inside the Model

The researchers identified several specific circuits within GPT-2 Small that appear to detect and respond to biological features. For instance, one circuit they traced was specialized in recognizing whether adenine (A) was paired with thymine (T), a key feature in DNA’s double-helix structure. Another focused on codons, the three-letter sequences that instruct cells which amino acids to produce.

In one striking example, the model demonstrated behavior suggesting it had identified start and stop codons—the biological equivalent of capital letters and periods in written language. These codons mark where protein-coding regions begin and end.

The strategies generalize to arbitrary DNA sequences in a human-interpretable way, as well as being robust to adversarial attacks, according to the researchers.

This suggests the model isn’t merely memorizing sequences or picking up on statistical quirks. Instead, it may be building a form of abstract biological knowledge, despite being trained only on internet text and never seeing labeled genomic data.

How Attribution Graphs Work

The team introduced attribution graphs as a way to map the flow of information through the model. In contrast to older tools like attention heatmaps — which show where a model is “looking” — attribution graphs aim to pinpoint exactly which neurons and layers contribute to a given prediction and how.

The technique involves calculating the contribution of each neuron or circuit to the final output using a variant of path attribution, a method from classical feature attribution in machine learning. The researchers say this lets them trace high-level concepts like “this neuron activates when it sees a stop codon” back to the raw architecture.

In a critical step, the were able to decompose the model’s knowledge into distinct functional units. This modularity allowed them to confirm that certain neurons were acting as base-pair matchers, while others encoded larger patterns like whether a DNA strand could be translated into a protein.

Limits of what the model knows—and what it doesn’t

Despite the findings, the researchers emphasized that the model’s biological “understanding” is limited. While it can predict base-pair relationships and some protein-coding regions, it does not grasp higher-level functions such as gene regulation or the role of epigenetics.

In other words, the AI doesn’t understand biology in any traditional sense. Instead, it has learned statistical patterns in biological text and sequence data that resemble the rules used in actual genomes.

Moreover, the study was conducted on GPT-2 Small, a much older and smaller model than current industry standards like GPT-4 or Claude. The smaller size makes it easier to interpret, but it also means that any conclusions about what today’s large-scale models might contain are speculative.

“We expect that as models grow increasingly capable, predicting their mechanisms a priori will become more difficult, and the need for effective unsupervised exploration tools will grow,” the team writes. “We are optimistic that our tools can be made more cost- and time-effective and reliable – our current results are a lower bound on how useful such methods can be.”

Wider implications for AI and science

The work raises broader questions about the kinds of knowledge embedded in modern AI models. If a relatively small model can encode base-pair logic and protein-coding regions from text alone, then larger models might harbor even more detailed representations—possibly even useful ones for scientific discovery.

It also suggests a possible role for language models as tools in fields like genomics or synthetic biology. Rather than training specialized models from scratch, scientists might inspect general-purpose transformers for useful circuits and fine-tune them as needed.

Beyond biology, the attribution graph method offers a template for studying models across domains — revealing whether they have learned modular representations of concepts in chemistry, math, or physics. The method may also assist AI safety research by identifying hidden or dangerous circuits in large models.

The researchers pointed to future work in understanding tokenization, model internals, and applications to other scientific fields.

The team plans to expand its work to larger models and more complex biological phenomena, but warns that increasing scale may make interpretability harder. They also acknowledge that attribution graphs, while powerful, still depend on hand-selected examples and careful analysis by researchers.

Share this article:

AI Insider

Discover the future of AI technology with "AI Insider" - your go-to platform for industry data, market insights, and groundbreaking AI news

Subscribe today for the latest news about the AI landscape