- Last modified: October 2, 2024
Insider Brief
- OpenAI introduced new features at DevDay, including Model Distillation and Prompt Caching, to help developers streamline workflows and reduce costs when using AI models like GPT-4o.
- Vision fine-tuning on GPT-4o allows developers to train models with images, enhancing applications like visual search and object detection, with early adopters like Grab and Automat seeing significant improvements.
- The Realtime API enables low-latency, multimodal experiences with natural voice-to-voice conversations, already being used by apps like Healthify and Speak for more interactive user experiences.
OpenAI’s DevDay brought a series of significant announcements designed to streamline workflows for developers using its platform. From model distillation to real-time API capabilities, the updates are poised to enhance both performance and cost-efficiency across AI-driven applications. Here’s a breakdown of the key announcements.
Model Distillation Simplified
OpenAI introduced its Model Distillation feature to optimize workflows for developers looking to fine-tune models for specific tasks. This new offering allows developers to manage the entire distillation pipeline directly within the OpenAI platform. The process, which once required multiple tools and manual steps, is now fully integrated.
“Until now, distillation has been a multi-step, error-prone process,” OpenAI stated in the announcement post. Developers previously had to manually orchestrate operations, from generating datasets to fine-tuning models and measuring performance improvements. This new feature simplifies the process, reducing complexity and cost.
The Model Distillation suite includes several tools aimed at making the process seamless:
- Stored Completions allow developers to capture input-output pairs generated by models like GPT-4o and o1-preview, enabling the creation of datasets with production data.
- Evals offers a beta feature for developers to run custom evaluations on specific tasks, eliminating the need for separate scripts and logging tools.
- The integration with OpenAI’s existing fine-tuning offering allows developers to use these datasets in fine-tuning tasks and run evaluations on the same platform.
According to OpenAI, this integrated approach enables developers to fine-tune smaller, cost-efficient models like GPT-4o mini, using the outputs of more advanced models such as o1-preview and GPT-4o, significantly reducing costs without sacrificing performance.
Prompt Caching for Cost and Speed Efficiency
Another standout announcement from DevDay was the launch of Prompt Caching, aimed at developers who frequently use the same context across multiple API calls. This new feature promises to reduce costs and latency for developers working on tasks like chatbot conversations or codebase edits by reusing previously processed input tokens.
“Many developers use the same context repeatedly across multiple API calls when building AI applications,” the OpenAI team writes. Prompt Caching offers a 50% discount and faster processing times by caching recently used prompts.
Starting immediately, Prompt Caching is automatically applied to the latest versions of GPT-4o, GPT-4o mini, o1-preview, o1-mini, and fine-tuned versions of those models. The discount kicks in for prompts longer than 1,024 tokens, and the cache increases in increments of 128 tokens.
OpenAI explained the process behind Prompt Caching: “The API caches the longest prefix of a prompt that has been previously computed, starting at 1,024 tokens and increasing in 128-token increments.”
This feature allows developers to benefit from automatic caching discounts without making changes to their API integration.
Vision Fine-Tuning Expands GPT-4o’s Capabilities
Expanding beyond text-based fine-tuning, OpenAI introduced vision fine-tuning for GPT-4o, which now allows developers to fine-tune models using images as well as text. This enhancement unlocks applications ranging from improved object detection for autonomous vehicles to more accurate medical image analysis.
Since launching text-based fine-tuning on GPT-4o, hundreds of thousands of developers have already customized models to optimize performance. However, OpenAI acknowledged that text alone doesn’t always provide the necessary performance boost for certain tasks, which is why vision fine-tuning is a key advancement.
Developers can boost performance of GPT-4o for vision tasks with as few as 100 images, according to the post. Larger volumes of text and image data can drive even higher performance. The process mirrors text fine-tuning, with developers uploading datasets in a specific format to the platform.
OpenAI showcased two use cases where vision fine-tuning has already made an impact:
- Grab, a rideshare and food delivery company, used the feature to refine its mapping data by training GPT-4o to localize traffic signs and count lane dividers. With only 100 images, Grab improved lane count accuracy by 20% and traffic sign localization by 13%.
- Automat, an enterprise automation company, improved the success rate of its robotic process automation (RPA) agents from 16.60% to 61.67% by fine-tuning GPT-4o with a dataset of screenshots. Additionally, Automat trained the model on insurance documents, boosting its F1 score on information extraction by 7%.
Real-Time API Enables Low-Latency, Multimodal Experiences
One of the most anticipated features revealed at DevDay was the Realtime API, now in public beta for paid developers. The API supports natural speech-to-speech conversations, enabling more immersive, low-latency experiences in applications such as voice assistants and customer service agents.
Previously, developers had to rely on multiple models for these capabilities, leading to noticeable latency and loss of expression. OpenAI’s Realtime API addresses this by handling the entire conversational process—audio input, reasoning, and audio output—within a single API call.
According to OpenAI, this API builds on the foundation of ChatGPT’s Advanced Voice Mode but improves on it by streaming audio inputs and outputs directly. It can even handle interruptions automatically, creating more fluid conversations. Under the hood, developers can establish a persistent WebSocket connection to exchange messages with GPT-4o.
A few early adopters have already integrated the Realtime API into their platforms:
- Healthify, a nutrition and fitness coaching app, uses the API to power its AI coach, Ria, allowing for seamless conversations that involve both automated guidance and human dietitian support when necessary.
- Speak, a language-learning app, leverages the API for its role-play feature, helping users practice real-world conversations in new languages.
DevDay is an event hosted by OpenAI designed to showcase the company’s latest advancements, tools, and features specifically geared towards developers. It serves as a platform for unveiling new AI technologies, providing developers with updates on OpenAI’s models, and demonstrating how these tools can be integrated into real-world applications. The event typically includes product announcements, technical demos, and insights into how developers can leverage OpenAI’s platform to build innovative AI-powered solutions across various industries. DevDay aims to foster engagement within the developer community and empower them to harness AI effectively in their projects.