AI Research & Advances

Chinese Researchers’ 70-Billion-Parameter LLM Tailored For Chemical Engineering Outperforms Standard LLMs

Insider Brief

Researchers developed ChemELLM, a large language model specialized in chemical engineering, which outperformed mainstream LLMs on a new benchmark called ChemEBench.
The model was adapted from Spark-70B using 19 billion tokens of domain-specific data for pretraining and 1 billion tokens for fine-tuning on chemical engineering tasks.
ChemEBench evaluates LLMs across basic, advanced, and professional competencies, with ChemELLM demonstrating superior knowledge and problem-solving ability in the field.

PRESS RELEASE — The development of chemical technologies is a multi-stage process that typically begins with laboratory research, progresses through scale-up and basic engineering, and culminates in industrial deployment. This complex process requires synergistic collaboration among experts from diverse disciplines such as chemistry, physics, mathematics, electrical engineering, process design, and architecture to address technical bottlenecks while balancing economic viability.

However, interdisciplinary collaboration is often hindered by disciplinary boundaries, posing significant challenges to maintaining consistency in design intentions during chemical process development.

Emerging strategies such as data-driven artificial intelligence (AI) technologies have gained recognition for their potential to streamline development pipelines and enhance process efficiency. Particularly, the advent of large language models (LLMs), trained on extensive corpora encapsulating complex, cross-disciplinary information, offers unprecedented opportunities to revolutionize scientific workflows.

Recently, a research team led by Prof. Mao Ye (Dalian Institute of Chemical Physics, Chinese Academy of Sciences) & Prof. Xin Li (iFLYTEK Co., Ltd.) has developed ChemELLM, a domain-specialized LLM designed for chemical engineering applications.

Built upon the Spark 70B foundation model, ChemELLM underwent domain-adaptive pretraining and instruction fine-tuning using ChemEData, a carefully curated corpus of high-quality chemical engineering data. Additionally, to assess the knowledge and problem-solving capabilities of LLMs in this filed, the team introduced ChemEBench, a comprehensive benchmark designed for chemical engineering. The results were published in the Chinese Journal of Catalysis (DOI:10.1016/S1872-2067(25)64725-5).

ChemEData, a specialized dataset containing 19 billion tokens for pre-training and 1 billion tokens for fine-tuning, was constructed. Domain pre-training was conducted on the Spark-70B foundation model using a 19-billion-token chemical engineering corpus. This approach enables ChemELLM to acquire domain-specific knowledge while retaining Spark-70B’s foundational capabilities. During the supervised fine-tuning phase, 2.75 million high-quality data (1 billion tokens) were utilized to align the model with the specific language patterns and terminology of chemical engineering.

The ChemEBench benchmark integrates three progressive evaluation stages-basic knowledge, advanced knowledge, and professional skills-to comprehensively assess LLMs in this specialized domain. Evaluation results highlight ChemELLM’s superior performance over mainstream LLMs (including O1-Preview, GPT-4o, and DeepSeek-R1) on ChemEBench, demonstrating its excellence in chemical engineering tasks.

Need Deeper Intelligence on the AI Market?

AI Insider's Market Intelligence platform tracks funding rounds, competitive landscapes, and technology trends across the global AI ecosystem in real time. Get the data and insights your organization needs to make informed decisions.

AI, AI Funding & Investment, AI Research & Advances, Robotics

Rice and NASA Launch Open-source Remote Space Robotics Simulator

Insider Brief A Rice University and NASA Johnson Space Center project has produced an open-source simulator for developing robots that could work inside spacecraft and

AI, AI Funding & Investment, AI Infrastructure & Compute, Robotics

X Square Robot Launches Embodied AI Data Collection Platform Quanxta Zero Series

Insider Brief Chinese humanoid robot maker X Square Robot launched a software-and-hardware platform designed to help collect, process and use data for embodied AI models.

AI, AI Funding & Investment, Robotics

Emesent Secures $25M AUD in Funding to Scale Autonomous Mapping and Intelligence Platform

Insider Brief Australia’s Emesent raised $25 million AUD to expand its autonomous mapping and robotics business and speed up development of its AI and cloud

Stay Updated with AI Insider

Get the latest AI funding news, market intelligence, and industry insights delivered to your inbox weekly.

Market Intelligence & Data

Track funding, map landscapes, and access bespoke data cuts.

Strategic Advisory

Market entry playbooks, ecosystem analysis, and technology scouting.

Due Diligence

Technical, commercial, and regulatory assessments for investors.

$ 0 M

Seed round tracked

Gitar — Code Validation

AI, AI Funding & Investment, AI Research & Advances, Robotics

Rice and NASA Launch Open-source Remote Space Robotics Simulator

July 12, 2026

AI, AI Funding & Investment, AI Infrastructure & Compute, Robotics

X Square Robot Launches Embodied AI Data Collection Platform Quanxta Zero Series

July 10, 2026

AI, AI Funding & Investment, Robotics

Emesent Secures $25M AUD in Funding to Scale Autonomous Mapping and Intelligence Platform

July 10, 2026

Get the Weekly Briefing

Funding analysis, market intelligence, and industry trends delivered to your inbox every week.

Need bespoke intelligence?

Our team combines real-time data with decades of sector experience to guide your decisions.

Chinese Researchers’ 70-Billion-Parameter LLM Tailored For Chemical Engineering Outperforms Standard LLMs

Need Deeper Intelligence on the AI Market?

Related Articles

Rice and NASA Launch Open-source Remote Space Robotics Simulator

X Square Robot Launches Embodied AI Data Collection Platform Quanxta Zero Series

Emesent Secures $25M AUD in Funding to Scale Autonomous Mapping and Intelligence Platform

Stay Updated with AI Insider

Market Intelligence & Data

Strategic Advisory

Due Diligence

Seed round tracked

Rice and NASA Launch Open-source Remote Space Robotics Simulator

X Square Robot Launches Embodied AI Data Collection Platform Quanxta Zero Series

Emesent Secures $25M AUD in Funding to Scale Autonomous Mapping and Intelligence Platform

Get the Weekly Briefing

Need bespoke intelligence?

Subscribe today for the latest news about the AI landscape