Inception, a newly launched AI company founded by Stanford computer science professor Stefano Ermon, has introduced a groundbreaking AI model based on diffusion technology. The company’s diffusion-based large language model (DLM) aims to revolutionize text generation by significantly improving speed and reducing computational costs.
Ermon stated that his research at Stanford focused on adapting diffusion models — traditionally used in image, video, and audio generation — to text-based AI. He explained that unlike large language models (LLMs), which process words sequentially, diffusion models generate and refine entire blocks of text simultaneously. This parallel processing method, he said, eliminates bottlenecks in AI inference, allowing for faster and more efficient performance.
The company has already secured customers, including Fortune 100 firms, by addressing the industry’s demand for lower AI latency and increased processing speed. Ermon emphasized that Inception’s models leverage GPUs more efficiently, leading to up to 10 times faster performance and significantly reduced costs compared to conventional LLMs. He asserted that this advancement would fundamentally change how language models are built and deployed.
Inception provides an API, on-premises deployment, and edge device integration, along with a suite of pre-trained DLMs tailored for diverse applications. A company spokesperson highlighted that its smaller coding model rivals OpenAI’s GPT-4o mini while being over 10 times faster. The company’s mini model also reportedly outperforms open-source alternatives like Meta’s Llama 3.1 8B, achieving speeds of over 1,000 tokens per second.
He co-founded Inception with former students Aditya Grover, now a professor at UCLA, and Volodymyr Kuleshov, a professor at Cornell.
Featured image: Credit: Inception