Business / Tue, 30 Jun 2026 MarkTechPost

Anthropic Claude Sonnet 5 vs Sonnet 4.6 vs Opus 4.8: Agentic Coding Benchmarks, API Pricing, and Cost-Performance Tradeoffs Compared

TL;DRSonnet 5 is Anthropic’s most agentic mid-tier model , closing much of the gap to Opus 4.8., closing much of the gap to Opus 4.8. Cheaper to run : $2/$10 per MTok intro pricing through Aug 31, then $3/$15; Opus 4.8 is $5/$25. : $2/$10 per MTok intro pricing through Aug 31, then $3/$15; Opus 4.8 is $5/$25. Sonnet 5 is an upgrade to Sonnet 4.6, which launched in February 2026. BenchmarkAnthropic team published a benchmark table comparing Sonnet 5, Sonnet 4.6, and Opus 4.8.

Anthropic just shipped Claude Sonnet 5. They call it its most agentic Sonnet model yet. It plans, drives browsers and terminals, and runs autonomously across long tasks.

Sonnet 5 is the default model for Free and Pro plans today. Max, Team, and Enterprise users can select it. It is also live in Claude Code and on the Claude Platform.

TL;DR

Sonnet 5 is Anthropic’s most agentic mid-tier model , closing much of the gap to Opus 4.8.

, closing much of the gap to Opus 4.8. Beats Sonnet 4.6 on every published benchmark : 63.2% SWE-bench Pro, 81.2% OSWorld-Verified, 57.4% HLE.

: 63.2% SWE-bench Pro, 81.2% OSWorld-Verified, 57.4% HLE. Cheaper to run : $2/$10 per MTok intro pricing through Aug 31, then $3/$15; Opus 4.8 is $5/$25.

: $2/$10 per MTok intro pricing through Aug 31, then $3/$15; Opus 4.8 is $5/$25. Best value at low/medium effort ; at xhigh it can cost more than Opus 4.8 for similar quality.

; at xhigh it can cost more than Opus 4.8 for similar quality. Safer than 4.6, with deliberately low cyber capability — Opus stays the pick for accuracy-critical work.

Claude Sonnet 5

Sonnet sits in the middle of Anthropic’s lineup. It is above the cheaper Haiku 4.5 and below the flagship Opus 4.8.

Sonnet 5 is an upgrade to Sonnet 4.6, which launched in February 2026. Anthropic frames this release around agentic reliability, not one headline benchmark.

In practice, that means longer task chains without losing context. It means better self-correction when a tool call fails. It means steadier behavior across extended sessions inside Claude Code or Cowork.

The model exposes effort levels: low, medium, high, and xhigh (extra high). Higher effort spends more tokens on reasoning. That raises both quality and cost.

It is important to note that Sonnet 5 uses an updated tokenizer, the same one introduced with Opus 4.7. The same text can map to roughly 1.0 to 1.35 times more tokens.

Interactive Explainer

Claude Sonnet 5 Cost & Capability Explorer

Claude Sonnet 5 — Cost & Capability Explorer Estimate per-task cost across models and compare published benchmarks. All figures from Anthropic’s June 30, 2026 launch. Per-task cost estimator Sonnet 5 (intro $2/$10) Sonnet 5 (std $3/$15) Opus 4.8 ($5/$25) Sonnet 4.6 ($3/$15) Input tokens per task: 20,000 Output tokens per task: 6,000 Tasks per day: 500 Sonnet 5 tokenizer factor: 1.15× $0.00 per task • $0.00 /day • $0.00 /mo Sonnet 5 uses an updated tokenizer (same as Opus 4.7). The same text can map to roughly 1.0–1.35× more tokens, so the factor is applied to Sonnet 5 only. Published benchmark comparison Agentic coding (SWE-bench Pro) Terminal-Bench 2.1 Computer use (OSWorld-Verified) Humanity’s Last Exam (tools) Sonnet 4.6 Sonnet 5 Opus 4.8 On knowledge work (GDPval-AA v2), Sonnet 5 scores 1,618 and edges Opus 4.8’s 1,615. That benchmark uses a different scale, so it is shown here as a note rather than a bar.

Benchmark

Anthropic team published a benchmark table comparing Sonnet 5, Sonnet 4.6, and Opus 4.8. Sonnet 5 beats its predecessor in every tested category. It closes much of the gap to Opus 4.8.

On agentic coding (SWE-bench Pro), Sonnet 5 scores 63.2%. Sonnet 4.6 scored 58.1%. Opus 4.8 still leads at 69.2%.

On computer use (OSWorld-Verified), Sonnet 5 posts 81.2% against Sonnet 4.6’s 78.5%. On Terminal-Bench 2.1, it reaches 80.4% versus 67.0%.

On Humanity’s Last Exam with tools, Sonnet 5 hits 57.4%. That nearly matches Opus 4.8 at 57.9%.

There is one place where Sonnet 5 edges ahead. On the GDPval-AA v2 knowledge-work benchmark, it scores 1,618 against Opus 4.8’s 1,615.

https://www.anthropic.com/news/claude-sonnet-5

Effort Levels: Where the Real Tradeoff Lives

The cost-performance story is the most important part for developers. Sonnet 5 is a strict improvement over Sonnet 4.6 across every effort level. The clearest value appears at low and medium effort.

At those levels, Sonnet 5 delivers quality that earlier Sonnet pricing could not buy. Opus 4.8 remains the accuracy leader at the top of the range.

A practical routing policy follows from this. Send most agentic coding, tool use, and knowledge work to Sonnet 5. Reserve Opus 4.8 for accuracy-critical tasks. Keep Haiku 4.5 for high-volume, latency-sensitive calls.

Use Cases: Where Sonnet 5 Fits

Early access partners described concrete workflows. Their reports map to common engineering jobs.

Multi-step software engineering : One tester asked Sonnet 5 to investigate a bug. It wrote a reproducing test, implemented the fix, then confirmed the bug returned without the change. It did this in a single pass.

: One tester asked Sonnet 5 to investigate a bug. It wrote a reproducing test, implemented the fix, then confirmed the bug returned without the change. It did this in a single pass. Brownfield debugging : Another partner ran it on hard pull requests. The model traced failures to their root causes. It shipped durable fixes rather than symptom patches.

: Another partner ran it on hard pull requests. The model traced failures to their root causes. It shipped durable fixes rather than symptom patches. Business automation : Zapier handed it a two-part job. It updated Salesforce account tiers, then sent a launch email to enterprise contacts. It finished the task end to end.

: Zapier handed it a two-part job. It updated Salesforce account tiers, then sent a launch email to enterprise contacts. It finished the task end to end. Computer-use agents : Pace runs insurance workflows like submission intake and loss runs. Its agents act on the operational systems teams already use.

: Pace runs insurance workflows like submission intake and loss runs. Its agents act on the operational systems teams already use. Data exploration: ClickHouse agents query live data and produce insights on the fly. Faster reasoning means faster time-to-insight for analysts.

Comparison Table

Metric / Spec Sonnet 4.6 Sonnet 5 Opus 4.8 Agentic coding (SWE-bench Pro) 58.1% 63.2% 69.2% Terminal-Bench 2.1 67.0% 80.4% not reported Computer use (OSWorld-Verified) 78.5% 81.2% not reported Humanity’s Last Exam (with tools) 46.8% 57.4% 57.9% Knowledge work (GDPval-AA v2) not reported 1,618 1,615 Input price ($/MTok) 3 2 intro, then 3 5 Output price ($/MTok) 15 10 intro, then 15 25

Sonnet 5’s introductory pricing runs through August 31, 2026. Standard pricing of $3/$15 begins after that date. Standard prompt caching (cache reads at 0.1x input) and the 50% Batch API discount also apply. Per token, Sonnet 5 undercuts GPT-5.5 and Gemini 3.1 Pro, but costs more than Gemini 3.5 Flash. Anthropic lists a 1M-token context window for Sonnet 5 in its launch post. It does not publish context figures for the other models here.

Coding Example: Calling Sonnet 5

The API call mirrors any other Anthropic model. You change the model string to claude-sonnet-5 .

Copy Code Copied Use a different Browser import anthropic client = anthropic.Anthropic() # reads ANTHROPIC_API_KEY message = client.messages.create( model="claude-sonnet-5", max_tokens=1024, messages=[ {"role": "user", "content": "Find the race condition in worker.py and ship a tested fix."} ], ) print(message.content[0].text)

Strengths and Weaknesses

Strengths:

Improves on Sonnet 4.6 in every benchmark category Anthropic tested

Near-Opus 4.8 quality on several evaluations, at lower per-token prices

Edges Opus 4.8 on the GDPval-AA v2 knowledge-work benchmark

Lower hallucination, sycophancy, and undesirable-behavior rates than Sonnet 4.6

Drop-in API change: you only swap the model string

Weaknesses:

Opus 4.8 still wins on the hardest accuracy-critical tasks

At xhigh effort, cost can exceed Opus 4.8 at similar quality

The new tokenizer can raise token counts by up to 1.35 times

Cyber capability is intentionally low; use Opus for sanctioned cyber work

Standard pricing of $3/$15 arrives after August 31, 2026

Community Reaction

Claude Sonnet 5 — Community Reaction

Check out the Technical details. Also, feel free to follow us on Twitter and don’t forget to join our 150k+ML SubReddit and Subscribe to our Newsletter. Wait! are you on telegram? now you can join us on telegram as well.