USC Study Looks at How ChatGPT “Sees” Colors in Language

Insider Brief

  • A USC-led study, partially funded by Google, found that humans with hands-on color experience—particularly painters—outperform large language models like ChatGPT in interpreting novel color metaphors, revealing the importance of sensory grounding in language comprehension.
  • The research compared how sighted adults, colorblind adults, painters, and ChatGPT associated colors with abstract terms and explained metaphorical phrases; painters demonstrated a deeper understanding due to embodied experience, while ChatGPT relied solely on cultural and emotional associations.
  • The study underscores a core limitation in current AI: without direct sensory input, models like ChatGPT struggle with unfamiliar or inverted metaphors, highlighting challenges in replicating human-like understanding through language data alone.

Can large language models “understand” colorful phrases like “golden opportunity” or “seeing red” without actually seeing the colors red or gold?

A research project partially funded by Google set out to anser that question and has found that hands-on experience with color plays a key role in how people interpret colorful language, a finding that may expose critical gaps in how AI models like ChatGPT process metaphors.

“ChatGPT uses an enormous amount of linguistic data to calculate probabilities and generate very human-like responses,” said Lisa Aziz-Zadeh, the publication’s senior author. “But what we are interested in exploring is whether or not that’s still a form of secondhand knowledge, in comparison to human knowledge grounded in firsthand experiences.”

The study, led by neuroscientists at the University of Southern California and published in Cognitive Science, compared the responses of four groups—sighted adults, colorblind adults, painters, and ChatGPT—when asked to associate colors with abstract concepts and explain both familiar and unfamiliar color metaphors. According to USC, while sighted and colorblind individuals performed similarly, indicating that visual perception alone may not be essential for metaphor comprehension, painters outperformed all groups in interpreting novel color-based metaphors.

ChatGPT, trained solely on text, gave consistently logical responses and leaned heavily on cultural and emotional associations. But it struggled with unfamiliar or inverted metaphors, such as interpreting what it means to feel “burgundy,” or identifying the opposite of “green.” These failures point to a core limitation in AI: an absence of direct, embodied experience.

The research team included scientists from USC, UC San Diego, Stanford, Université de Montréal, the University of the West of England, and Google DeepMind. The project was partially supported by a Google Faculty Gift to senior author Lisa Aziz-Zadeh, as well as funding from the Barbara and Gerson Bakar Faculty Fellowship and the Haas School of Business at UC Berkeley.

Participants were asked to assign colors to words like “physics” and “honesty,” interpret phrases like “red alert” or “a pink party,” and explain their reasoning. ChatGPT often responded with plausible cultural explanations, but rarely drew on explanations grounded in the type of sensory or physical interaction with color that painters did, researchers noted.

The results challenge a widespread assumption in AI development: that language alone is sufficient to replicate human understanding. While large language models excel at mimicking human phrasing and associations, they appear to miss the nuanced understanding that arises from physical interaction with the world. The study found that painters’ routine manipulation of pigments and hues gave them a distinct advantage in conceptualizing and explaining complex metaphors.

The methods used in the study focused on large-scale online surveys and open-ended questions to measure how each group reasoned through metaphor. While ChatGPT showed strength in general consistency and fluency, its lack of embodied reasoning limited its success with unfamiliar scenarios.

According to researchers, a key implication of the findings is that adding sensory input—such as visual or tactile data—could help future AI systems develop more human-like cognitive abilities. The USC-led team suggests that integrating direct sensory modalities into AI could be a path forward for models meant to handle nuanced, high-context reasoning.

One limitation of the research is that it focused on metaphor comprehension within a narrow range of color-based language. Broader linguistic or cognitive domains may yield different patterns. In addition, ChatGPT’s responses reflect the training limitations of current models and may not be representative of future AI systems with expanded capabilities.

The team plans to explore whether AI models that combine language with sensory input—such as visual recognition or interactive feedback—can move beyond mimicking semantics and begin to exhibit more human-like reasoning. For now, the study reinforces the gap between statistical prediction and the embodied experience that underpins much of human understanding.

AI Insider

Discover the future of AI technology with "AI Insider" - your go-to platform for industry data, market insights, and groundbreaking AI news

Subscribe today for the latest news about the AI landscape