Anthropic Launches Claude 3.5 Sonnet, Setting New Benchmarks for AI Performance

claude 3.5

Insider Brief

  • Artificial intelligence company Anthropic has announced the release of Claude 3.5 Sonnet.
  • The new model reportedly offers significant improvements in intelligence and capabilities.
  • The company reports that the new model sets benchmarks in graduate-level reasoning, undergraduate-level knowledge, and coding proficiency.

Artificial intelligence company Anthropic has announced the release of Claude 3.5 Sonnet, the latest addition to its Claude AI model family, according to a company blog post. The new model reportedly offers significant improvements in intelligence and capabilities while maintaining competitive speed and pricing.

Claude 3.5 Sonnet is now available for free on Claude.ai and the Claude iOS app, with higher usage limits for Claude Pro and Team plan subscribers. It can also be accessed via the Anthropic API and cloud platforms including Amazon Bedrock and Google Cloud’s Vertex AI.

Performance Improvements

According to Anthropic, Claude 3.5 Sonnet “raises the industry bar for intelligence, outperforming competitor models and Claude 3 Opus on a wide range of evaluations.” The company reports that the new model sets benchmarks in graduate-level reasoning, undergraduate-level knowledge, and coding proficiency.

Specifically, Claude 3.5 Sonnet shows improved performance on the following benchmarks:

  • GPQA (graduate-level reasoning)
  • MMLU (undergraduate-level knowledge)
  • HumanEval (coding proficiency)

Anthropic states that Claude 3.5 Sonnet demonstrates “marked improvement in grasping nuance, humor, and complex instructions” and excels at “writing high-quality content with a natural, relatable tone.”

The model also shows significant gains in coding capabilities. In an internal evaluation of agentic coding, Claude 3.5 Sonnet solved 64% of problems, compared to 38% for the previous Claude 3 Opus model. This test assesses the AI’s ability to fix bugs or add functionality to open source code based on natural language descriptions.

Visual Capabilities

Claude 3.5 Sonnet incorporates enhanced visual processing abilities, which Anthropic describes as “state-of-the-art.” The company reports that the new model surpasses Claude 3 Opus on standard vision benchmarks, with notable improvements in visual reasoning tasks like interpreting charts and graphs.

The model can also accurately transcribe text from imperfect images, a capability that could have applications in retail, logistics, and financial services.

Speed and Pricing

Despite the performance improvements, Claude 3.5 Sonnet operates at twice the speed of Claude 3 Opus. Anthropic positions this as a key advantage, stating that the “performance boost, combined with cost-effective pricing, makes Claude 3.5 Sonnet ideal for complex tasks such as context-sensitive customer support and orchestrating multi-step workflows.”

The model is priced at $3 per million input tokens and $15 per million output tokens, with a 200,000 token context window.

New Features: Artifacts

Alongside the model release, Anthropic is introducing a new feature called Artifacts on Claude.ai. This feature allows users to view and interact with AI-generated content like code snippets, text documents, or website designs in a dedicated window adjacent to their conversation with Claude.

Anthropic describes Artifacts as “a new way to use Claude” that creates “a dynamic workspace where [users] can see, edit, and build upon Claude’s creations in real-time, seamlessly integrating AI-generated content into their projects and workflows.”

The company frames this as part of a broader vision to evolve Claude.ai from a conversational AI into a collaborative work environment. Future plans include support for team collaboration, allowing organizations to “securely centralize their knowledge, documents, and ongoing work in one shared space, with Claude serving as an on-demand teammate.”

Safety and Privacy Measures

Anthropic team writes that it is committed to safety and responsible AI development.This has been a concern about other LLM services, such as ChatGPT.

The company states that Claude 3.5 Sonnet has undergone “rigorous testing” and has been “trained to reduce misuse.” Despite the increases in intelligence, Anthropic’s internal assessments conclude that Claude 3.5 Sonnet remains at ASL-2 (Anthropic Safety Level 2).

The model was provided to the UK’s Artificial Intelligence Safety Institute (UK AISI) for pre-deployment safety evaluation. Results were shared with the US AI Safety Institute (US AISI) as part of a recent partnership between the two organizations.

Anthropic also reports collaborating with external experts to refine safety mechanisms, including child safety experts from Thorn who provided feedback on classifiers and model fine-tuning.

Regarding privacy, Anthropic states: “We do not train our generative models on user-submitted data unless a user gives us explicit permission to do so. To date we have not used any customer or user-submitted data to train our generative models.”

Future Developments

Anthropic indicates that Claude 3.5 Sonnet is the first in a series of planned releases. The company aims to “substantially improve the tradeoff curve between intelligence, speed, and cost every few months.” Two additional models in the Claude 3.5 family, Claude 3.5 Haiku and Claude 3.5 Opus, are slated for release later this year.

The company is also exploring new modalities and features to support business use cases, including enterprise application integrations and a Memory feature to enable personalized user experiences.

With the release of Claude 3.5 Sonnet, Anthropic continues to push the boundaries of AI capabilities while emphasizing responsible development practices. The coming months will likely see further advancements as the company expands its model offerings and explores new applications for its technology.

Share this article:

AI Insider

Discover the future of AI technology with "AI Insider" - your go-to platform for industry data, market insights, and groundbreaking AI news

Subscribe today for the latest news about the AI landscape