Stevens Team Creates AI Model to Spot Misleading Scientific Reporting

News, Research, Uncategorized

Insider Brief

Researchers at Stevens Institute of Technology have developed an AI system using large language models to detect misleading claims in news coverage of scientific research.
The system, trained on 2,400 reports linked to peer-reviewed research, achieved about 75% accuracy by analyzing claims through structured “validity dimensions” such as oversimplification and confusion of correlation with causation.
The research is reportedly the first to systematically use LLMs for verifying scientific accuracy in public media and could lead to tools like browser plugins to flag inaccuracies in science reporting.

Artificial intelligence may one day help fight misinformation about science rather than spread it, according to a new study.

Researchers at Stevens Institute of Technology have developed a system that uses large language models, or LLMs, to identify misleading claims in media coverage of scientific discoveries. The system, presented at a workshop of the Association for the Advancement of Artificial Intelligence, showed promise in detecting inaccuracies in news reports with roughly 75% accuracy. Its creators say it could pave the way for tools that help readers better evaluate science-based reporting online.

“Inaccurate information is a big deal, especially when it comes to scientific content — we hear all the time from doctors who worry about their patients reading things online that aren’t accurate, for instance,” noted K.P. Subbalakshmi, the paper’s co-author and a professor in the Department of Electrical and Computer Engineering at Stevens. “We wanted to automate the process of flagging misleading claims and use AI to give people a better understanding of the underlying facts.”

According to Stevens, the team built a dataset of 2,400 news reports that summarized or described scientific breakthroughs. These articles came from both high-quality and low-quality sources, as well as from AI-generated text. Each news report was linked to the peer-reviewed abstract of the research it discussed, allowing comparisons between the claims made and the original scientific findings.

According to the researchers, this is the first time a study has systematically used LLMs to verify scientific accuracy in public media.

“Creating this dataset is an important contribution in its own right, since most existing datasets typically do not include information that can be used to test systems developed to detect inaccuracies ‘in the wild’” Subbalakshmi pointed out. “These are difficult topics to investigate, so we hope this will be a useful resource for other researchers.”

Three different LLM-based methods were tested. The most effective approach involved a three-step process: summarizing the news report, comparing summaries to source research and deciding whether the article accurately portrayed the science. To improve results, the models also analyzed news reports along five dimensions of validity—such as whether the report confused correlation with causation or exaggerated findings.

The researchers found their system was better at spotting inaccuracies in human-written reports than in those written by AI, possibly because both LLMs and non-experts have difficulty identifying subtle technical flaws in AI-generated writing. This suggests that as AI-generated content becomes more prevalent, verifying its accuracy may require even more tailored tools.

While the accuracy rate leaves room for improvement, the system performed better when asked to reason through the five structured “validity dimensions” including common, specific mistakes, such as oversimplification or confusing causation and correlation.

“We found that asking the LLM to use these dimensions of validity made quite a big difference to the overall accuracy,” Dr. Subbalakshmi said and added that these dimensions of validity can be expanded upon, to better capture domain specific inaccuracies, if needed.

In practical terms, this line of research could lead to tools that automatically assess the credibility of science news—for example, browser extensions that flag suspect claims or systems that score publishers based on reliability, according to researchers. It could also guide efforts to improve how LLMs explain research, reducing the risk of misleading simplifications.

The project included graduate students at Stevens and aims to make its dataset available to the wider research community. The researchers emphasize that better understanding how AI interprets science is key to ensuring its use supports rather than distorts public understanding of research.

“There’s certainly room for improvement in our architecture,” Subbalakshmi said. “The next step might be to create custom AI models for specific research topics, so they can ‘think’ more like human scientists.”