Insider Brief
- A new study by researchers from the University of Chicago, NYU, and UC Santa Cruz finds that AI weather models, while accurate for standard forecasting, struggle to predict unprecedented extreme weather events that fall outside their training data.
- Neural networks trained on decades of historical weather records failed to accurately predict Category 5 hurricanes when such events were removed from the training set, raising concerns about the models’ ability to handle rare, high-impact phenomena.
- The researchers advocate integrating physical modeling with AI, such as incorporating atmospheric dynamics equations and using active learning to guide the generation of extreme scenario data for more robust future forecasting systems.
Artificial intelligence models can generate accurate short-term weather forecasts, but a new study finds they struggle with unprecedented extreme events that fall outside their training data. Researchers from the University of Chicago, along with collaborators from NYU and UC Santa Cruz, tested neural networks on their ability to predict rare and powerful weather patterns, such as Category 5 hurricanes, and found significant limitations, according to UChicago.
“AI weather models are one of the biggest achievements in AI in science. What we found is that they are remarkable, but not magical,” noted Pedram Hassanzadeh, an associate professor of geophysical sciences at UChicago and a corresponding author on the study. “We’ve only had these models for a few years, so there’s a lot of room for innovation.”
Published in Proceedings of the National Academy of Sciences, the study examined how AI systems trained on historical weather data responded when presented with scenarios that included events deliberately excluded from their training set.
“These models do really, really well for day-to-day weather,” he said. “But what if next week there’s a freak weather event?”
According to the study, researchers are concerned that the neural network is only referencing the weather data currently available, which goes back about 40 years.
“The floods caused by Hurricane Harvey in 2017 were considered a once-in-a-2,000-year event, for example,” Hassanzadeh said. “They can happen.”
In one experiment, the researchers trained a neural network on four decades of weather records, removing all major hurricanes above Category 2. When fed atmospheric data leading to a Category 5 hurricane, the AI consistently underestimated the storm’s strength, never predicting anything beyond a Category 2.
“It always underestimated the event. The model knows something is coming, but it always predicts it’ll only be a Category 2 hurricane,” Yongqiang Sun, research scientist at UChicago and the other corresponding author on the study pointed out.
This inability to extrapolate beyond known patterns raises concerns as AI becomes more integrated into meteorology and disaster preparedness. Unlike traditional models that rely on the physics and math of atmospheric dynamics, neural networks derive predictions purely from pattern recognition. While efficient and increasingly accurate for standard forecasting, their reliance on precedent makes them vulnerable to false negatives when dealing with so-called “gray swan” events—rare but catastrophic occurrences like 200-year floods or record-breaking heatwaves.
The researchers note that neural networks demonstrated better extrapolation when rare events had occurred elsewhere in the world. For example, if data on Atlantic hurricanes were removed but Pacific hurricanes remained, the AI could still predict strong Atlantic storms. This suggests that global training datasets improve performance, but also highlights the risk of regional blind spots.
“This was a surprising and encouraging finding: it means that the models can forecast an event that was unpresented in one region but occurred once in a while in another region,” Hassanzadeh said.
To improve model robustness, the study advocates merging AI techniques with physical modeling. By incorporating equations governing atmospheric processes, future models could retain the flexibility of AI while gaining the theoretical rigor of traditional weather forecasting systems. The team is exploring methods like active learning, where AI identifies which extreme scenarios are most valuable for training, guiding conventional models to simulate them.
“Longer simulated or observed datasets aren’t going to work. We need to think about smarter ways to generate data,” said Jonathan Weare, professor at the Courant Institute of Mathematical Sciences at New York University and study co-author. “In this case, that means answering the question ‘where should I place my training data to achieve better performance on extremes?’ Fortunately, we think AI weather models themselves, when paired with the right mathematical tools, can help answer this question.”