Anthropic Launches Claude 3.5 Sonnet, Setting New Benchmarks for AI Performance

claude 3.5

Insider Brief

  • Artificial intelligence company Anthropic has announced the release of Claude 3.5 Sonnet.
  • The new model reportedly offers significant improvements in intelligence and capabilities.
  • The company reports that the new model sets benchmarks in graduate-level reasoning, undergraduate-level knowledge, and coding proficiency.

Artificial intelligence company Anthropic has announced the release of Claude 3.5 Sonnet, the latest addition to its Claude AI model family, according to a company blog post. The new model reportedly offers significant improvements in intelligence and capabilities while maintaining competitive speed and pricing.

Claude 3.5 Sonnet is now available for free on Claude.ai and the Claude iOS app, with higher usage limits for Claude Pro and Team plan subscribers. It can also be accessed via the Anthropic API and cloud platforms including Amazon Bedrock and Google Cloud’s Vertex AI.

Performance Improvements

According to Anthropic, Claude 3.5 Sonnet “raises the industry bar for intelligence, outperforming competitor models and Claude 3 Opus on a wide range of evaluations.” The company reports that the new model sets benchmarks in graduate-level reasoning, undergraduate-level knowledge, and coding proficiency.

Specifically, Claude 3.5 Sonnet shows improved performance on the following benchmarks:

  • GPQA (graduate-level reasoning)
  • MMLU (undergraduate-level knowledge)
  • HumanEval (coding proficiency)

Anthropic states that Claude 3.5 Sonnet demonstrates “marked improvement in grasping nuance, humor, and complex instructions” and excels at “writing high-quality content with a natural, relatable tone.”

The model also shows significant gains in coding capabilities. In an internal evaluation of agentic coding, Claude 3.5 Sonnet solved 64% of problems, compared to 38% for the previous Claude 3 Opus model. This test assesses the AI’s ability to fix bugs or add functionality to open source code based on natural language descriptions.

Visual Capabilities

Claude 3.5 Sonnet incorporates enhanced visual processing abilities, which Anthropic describes as “state-of-the-art.” The company reports that the new model surpasses Claude 3 Opus on standard vision benchmarks, with notable improvements in visual reasoning tasks like interpreting charts and graphs.

The model can also accurately transcribe text from imperfect images, a capability that could have applications in retail, logistics, and financial services.

Speed and Pricing

Despite the performance improvements, Claude 3.5 Sonnet operates at twice the speed of Claude 3 Opus. Anthropic positions this as a key advantage, stating that the “performance boost, combined with cost-effective pricing, makes Claude 3.5 Sonnet ideal for complex tasks such as context-sensitive customer support and orchestrating multi-step workflows.”

The model is priced at $3 per million input tokens and $15 per million output tokens, with a 200,000 token context window.

New Features: Artifacts

Alongside the model release, Anthropic is introducing a new feature called Artifacts on Claude.ai. This feature allows users to view and interact with AI-generated content like code snippets, text documents, or website designs in a dedicated window adjacent to their conversation with Claude.

Anthropic describes Artifacts as “a new way to use Claude” that creates “a dynamic workspace where [users] can see, edit, and build upon Claude’s creations in real-time, seamlessly integrating AI-generated content into their projects and workflows.”

The company frames this as part of a broader vision to evolve Claude.ai from a conversational AI into a collaborative work environment. Future plans include support for team collaboration, allowing organizations to “securely centralize their knowledge, documents, and ongoing work in one shared space, with Claude serving as an on-demand teammate.”

Safety and Privacy Measures

Anthropic team writes that it is committed to safety and responsible AI development.This has been a concern about other LLM services, such as ChatGPT.

The company states that Claude 3.5 Sonnet has undergone “rigorous testing” and has been “trained to reduce misuse.” Despite the increases in intelligence, Anthropic’s internal assessments conclude that Claude 3.5 Sonnet remains at ASL-2 (Anthropic Safety Level 2).

The model was provided to the UK’s Artificial Intelligence Safety Institute (UK AISI) for pre-deployment safety evaluation. Results were shared with the US AI Safety Institute (US AISI) as part of a recent partnership between the two organizations.

Anthropic also reports collaborating with external experts to refine safety mechanisms, including child safety experts from Thorn who provided feedback on classifiers and model fine-tuning.

Regarding privacy, Anthropic states: “We do not train our generative models on user-submitted data unless a user gives us explicit permission to do so. To date we have not used any customer or user-submitted data to train our generative models.”

Future Developments

Anthropic indicates that Claude 3.5 Sonnet is the first in a series of planned releases. The company aims to “substantially improve the tradeoff curve between intelligence, speed, and cost every few months.” Two additional models in the Claude 3.5 family, Claude 3.5 Haiku and Claude 3.5 Opus, are slated for release later this year.

The company is also exploring new modalities and features to support business use cases, including enterprise application integrations and a Memory feature to enable personalized user experiences.

With the release of Claude 3.5 Sonnet, Anthropic continues to push the boundaries of AI capabilities while emphasizing responsible development practices. The coming months will likely see further advancements as the company expands its model offerings and explores new applications for its technology.

Need Deeper Intelligence on the AI Market?

AI Insider's Market Intelligence platform tracks funding rounds, competitive landscapes, and technology trends across the global AI ecosystem in real time. Get the data and insights your organization needs to make informed decisions.

Related Articles

Nomadic Raises $8.4M in Seed Round For Automous Vehicle and Physical AI Visual Data Platform

Insider Brief Nomadic has raised $8.4 million in a funding round led by TQ Ventures, with participation from Pear VC, BAG Ventures, Predictive VC and

Littlebird Launches Full-Context AI Assistant with $11M Seed Funding

San Francisco-based AI startup Littlebird has launched its full-context assistant alongside an $11 million seed round led by Lotus Studio, with participation from investors including

Lobby Secures $2.2M to Scale AI Platform Automating Complex Travel Group Bookings

AI travel technology company Lobby has secured $2.2 million in funding from Founderful, with Pascal Mathis joining the company’s Board of Directors following the investment.

Stay Updated with AI Insider

Get the latest AI funding news, market intelligence, and industry insights delivered to your inbox weekly.

Subscribe today for the latest news about the AI landscape