Artificial Traders, Real Behaviors: LLMs Mirror Human Irrationality in Market Simulations

AI, News, Research, Slider

Insider Brief

A new study shows that large language models like GPT-3.5 and GPT-4 can mimic human-like behavior in simulated economic markets, raising the possibility of using AI agents in economic research and policy modeling.
When configured with memory and variability, the models reproduced realistic price forecasting behaviors and market dynamics, especially bounded rationality and trend-following, seen in human experiments.
The findings highlight both the potential and current limitations of using AI for multi-agent simulations, including reduced behavioral diversity and inconsistencies between narrative explanations and numerical forecasts.

Large language models can replicate human-like behavior in simulated markets, raising the possibility that AI agents may one day serve as stand-ins for people in economic research and policy modeling, according to a team of scientists.

A new study by researchers from University College London, CENTAI Institute, the Bank of Canada, and other institutions tested whether generative AI systems, such as GPT-3.5 and GPT-4, could reproduce the dynamics of laboratory economic markets. The results, posted to arXiv recently, suggest that large language models (LLMs) don’t follow textbook rationality, but rather show bounded rationality and trend-following behavior similar to real humans.

The study focused on a core question: Can AI agents not only make economic decisions, but also interact in ways that produce collective patterns resembling those in actual markets? According to the authors, the answer is yes — but only under certain conditions. When configured with a memory of past decisions and a high degree of output variability, LLMs showed behaviors consistent with human participants in classic economic experiments.

The findings suggest a range of new research paths, including ways for simulating complex social systems without human subjects.

Market Behavior Without Humans

The researchers adapted a well-known set of experiments originally conducted by economists studying how people forecast prices over time. In those experiments, small groups of humans tried to guess the future price of a product over 50 rounds, with their predictions influencing the actual price in a simulated market. Markets could be structured in two ways: one with positive feedback, where higher forecasts lead to higher prices (as in speculative markets), and one with negative feedback, where higher forecasts reduce prices (as in agricultural or production markets).

The AI simulations followed the same rules. Groups of six AI agents — each powered by either GPT-3.5 or GPT-4 — received instructions about the market setup and were asked to predict prices over 50 rounds. Their earnings depended on accuracy, mimicking incentive structures from the original human experiments. Crucially, the researchers tested how memory (access to past decisions) and temperature (a parameter controlling randomness) affected performance.

Fluctuations and Convergence

The results revealed patterns that closely resembled those seen in human trials. In positive feedback markets, both GPT models produced price fluctuations and sometimes failed to stabilize near the “equilibrium” price—the level where supply and demand should balance if everyone predicted rationally. In particular, GPT-3.5 agents produced large swings and bubble-like dynamics before eventually reversing course. GPT-4 agents generally settled to prices slightly above the theoretical equilibrium, echoing trends seen in human groups.

In negative feedback markets, GPT-4 showed the most human-like behavior, typically reaching price stability within 10 to 15 rounds. GPT-3.5 took longer, often requiring 25 rounds or more to settle down. Human subjects, by comparison, usually stabilized in fewer than 10 rounds.

What stood out was that LLM agents, like people, did not act as rational utility maximizers — which means agents that always make perfectly logical decisions to get the best outcome. Instead, their decisions reflected what economists call bounded rationality: rules of thumb, simple extrapolation from past trends and a tendency to follow recent price changes. This was especially visible in positive feedback settings, where trend-following behavior dominated. In contrast, AI agents in negative feedback markets showed more cautious adjustments, although GPT-3.5 sometimes overshot and oscillated before settling.

Parameters Matter

The study identified memory as a key driver of realistic behavior. Agents that could recall at least three past rounds of interaction performed much more like humans, while those with only one-step memory failed to converge or behaved unrealistically. Temperature — the model’s randomness setting — also played a role, with higher values producing more diverse and exploratory behavior, especially in GPT-3.5.

These effects were quantified by measuring how agents formed expectations. The authors estimated models capturing each agent’s forecasting rule, focusing on how much weight they gave to past prices, past predictions, and recent trends. LLM agents matched human patterns in many cases, but with notable differences: they showed less variability across agents, and lacked some of the more stubborn or overly simplistic strategies humans tend to adopt.

This limited heterogeneity — less behavioral diversity across agents — is a significant gap. In real experiments, some people consistently overreact, others underreact, and some refuse to change their minds at all. LLMs tended to cluster around a few strategies, particularly those based on trend extrapolation. The researchers noted that this lack of variation may limit the models’ ability to fully capture real-world dynamics unless further complexity is introduced.

Narratives That Fit the Numbers

To further probe agent behavior, the study examined the text explanations generated by the AI models. These narratives showed consistent reasoning: agents justified their forecasts by pointing to ongoing trends or recent outcomes. When prices were rising, agents cited momentum; when prices stalled or reversed, their expectations adjusted, sometimes lagging behind the actual price change.

Interestingly, some narratives diverged from the numerical predictions. At times, an agent would express optimism in its explanation but submit a lower price forecast, a sign that randomness introduced by the temperature setting may have affected outputs. These discrepancies hint at a limitation in using LLMs for consistent “thinking” across text and numbers, though they also mirror human inconsistencies under uncertainty.

Implications and Future Directions

This study adds to growing evidence that LLMs can simulate more than just isolated choices — they can interact, learn and influence one another in multi-agent systems. That opens doors to using generative models as economic testbeds, replacing costly and time-consuming human trials in many cases.

However, the researchers caution that these simulations are far from perfect. Key gaps remain in replicating the diversity and unpredictability of human behavior. Moreover, LLMs may reflect underlying biases in their training data or prompt designs. Past studies have shown these systems can skew politically or culturally, which could distort simulations if not accounted for.

To bridge these gaps, future work could incorporate simulated demographics, political identities, or even personality traits like risk aversion. The authors also suggest extending the models to more complex market structures and feedback mechanisms, including real-time policy interventions or speculative bubbles.

If further validated, LLM-based economic simulations could become valuable tools for central banks, regulators, and academic researchers. By modeling how people might behave under new conditions—without needing actual participants—they could speed up experimentation and provide early warnings for unintended consequences.

The results suggest a long-term vision where economic models aren’t static equations, but interactive systems populated by intelligent, adaptive agents. While that future isn’t here yet, this study takes a concrete step toward building it.