SandboxAQ Releases AQCat25 Dataset, Accelerating Next-Generation Catalysis and Materials Discovery with AI

periodic table, chemistry, science, atom, elements, research, atomic mass, physics, school, symbol, periodic table, periodic table, chemistry, chemistry, chemistry, chemistry, chemistry

Insider Brief

  • SandboxAQ released AQCat25, a large-scale AI dataset with 11 million data points to accelerate catalyst discovery and industrial chemistry applications.
  • The dataset incorporates spin polarization and covers 40,000 catalyst-intermediate systems, enabling accurate modeling and extending machine learning approaches to new industrially relevant problems.
  • Generated using over 400,000 GPU-hours on NVIDIA DGX Cloud, AQCat25 is publicly available on Hugging Face for academic and industrial use in areas such as sustainable fuels, hydrogen, and fertilizer production.

PRESS RELEASE —  SandboxAQ today announced the launch of AQCat25, a breakthrough large-scale AI dataset offering deep insights into the atomic-scale dynamics that drive state-of-the-art industrial chemistry. The AQCat25 dataset enables researchers to make highly accurate predictions of material properties in catalytic reactions from atomic structures.

Today, more than 90% of all commercially produced chemicals and over 80% of all manufactured goods rely on catalysts in their production. Mass-produced goods such as autos, medicines, gasoline and detergents all need catalysts for manufacture. AQCat25 delivers material value to the chemical and catalyst industries by overcoming two critical barriers that have hindered the use of AI for computational heterogeneous catalysis.

First, the dataset includes 11 million data points on 40,000 intermediate-catalyst systems generated using highly accurate quantum chemistry calculations on GPUs to ensure more reliable modelling predictions. AQCat25 extends the capability of existing fast-computing machine learning models to new, industrially-relevant problems and enables training frontier models to deliver up to 20,000x faster performance over physics-based methods for catalyst design.

Second, AQCat25 is the only large-scale catalytic AI dataset to include spin polarization, measuring magnetic effects, for materials beyond oxides. Since many of earth’s most abundant metals are spin polarized, AQCat25 is highly relevant for a broad range of applications such as producing sustainable aviation fuel and fertilizer, creating stable green hydrogen, converting industrial waste streams into useful materials, and other applications.

“AQCat25 enables scientists and engineers to design the next generation of chemicals, catalysts, and advanced materials faster and more cost-effectively than traditional manufacturing processes or existing AI-accelerated approaches,” said Dr. Adam Lewis, Head of Innovation SandboxAQ. “By publicly releasing AQCat25, SandboxAQ enables the world’s leading industrial companies and academic institutions to significantly accelerate their existing R&D and advance their efforts to bring innovative new products and solutions to market faster.”

AQCat25 was generated on NVIDIA DGX Cloud, leveraging more than 400,000 GPU-hours of computation using NVIDIA DGX H100 cards. The unified AI platform provided SandboxAQ with the optimized computing infrastructure needed to develop AQCat25 in record time.  

“Catalysts are essential for advancing industrial processes and converting raw materials into value for the global economy,” said Jeff Graf, Global Head of Business Development at SandboxAQ. “With the combined power of NVIDIA DGX Cloud platform and SandboxAQ’s Large Quantitative Models, AQCat25 will transform catalytic discovery and optimization processes, decrease R&D time, cost, and risk, and elevate the field of AI-powered materials science to its highest potential.”

Large Quantitative Models (LQMs) trained on datasets like AQCat25 can explore a broader chemical space, design novel compounds not currently found in literature, and identify optimal chemical compounds in days instead of months or years.

The AQCat25 dataset is publicly available today on the Hugging Face platform. To learn more, visit https://sandboxaq.com/aqcat25.

Matt Swayne

With a several-decades long background in journalism and communications, Matt Swayne has worked as a science communicator for an R1 university for more than 12 years, specializing in translating high tech and deep tech for the general audience. He has served as a writer, editor and analyst at The Space Impulse since its inception. In addition to his service as a science communicator, Matt also develops courses to improve the media and communications skills of scientists and has taught courses.

Share this article:

AI Insider

Discover the future of AI technology with "AI Insider" - your go-to platform for industry data, market insights, and groundbreaking AI news

Subscribe today for the latest news about the AI landscape