Wezebo
Back to all articles10 min readby Wezebo

Claude Opus 4.7 Review (2026): What Changed and Is It Worth Upgrading?

Anthropic's Opus 4.7 tops SWE-bench Pro at 64.3% and leads in tool use. But a new tokenizer quietly raises costs by up to 35%. We break down what changed and who should upgrade.

Claude Opus 4.7 review illustration

TL;DR

Rating: 8.5/10. Claude Opus 4.7 is the strongest coding and agentic AI model available right now, with meaningful jumps in SWE-bench scores and a new high-resolution image mode that actually matters for developer workflows. The catch: a new tokenizer quietly inflates your costs by up to 35%, even though the sticker price hasn't changed. If you're already on Opus 4.6 and use it heavily for code or tool-driven tasks, upgrading is a no-brainer. If you're mostly chatting or writing, the improvements are harder to feel.

What Is Claude Opus 4.7?

Claude Opus 4.7 is Anthropic's latest flagship AI model, released on April 16, 2026. It sits at the top of the Claude model lineup, designed for complex reasoning, long-context work, coding, and agentic tasks where the model calls tools and makes decisions across multiple steps.

It's available through the Claude API, Amazon Bedrock, Google Cloud Vertex AI, and Microsoft Foundry. The model keeps the same 1M token context window and 128K max output tokens as its predecessor, Opus 4.6.

The headline story here is coding performance. Opus 4.7 now holds the highest score ever recorded on SWE-bench Pro at 64.3%, putting it meaningfully ahead of both GPT-5.4 and its own predecessor. But there are several other changes worth unpacking.

Pricing and Plans

Plan / AccessPriceNotes
API Input$5 / 1M tokensUnchanged from Opus 4.6
API Output$25 / 1M tokensUnchanged from Opus 4.6
Prompt CachingUp to 90% savingsFor repeated context
Batch ProcessingUp to 50% savingsFor async workloads
Claude Pro$20/moIncludes Opus 4.7 access
Claude Max$100/mo or $200/moHigher usage limits

Here's the thing nobody's putting in the headline: Opus 4.7 ships with a new tokenizer. The same input text now produces up to 35% more tokens. So while Anthropic technically didn't raise prices, your effective cost per query can be noticeably higher. For light users on Pro or Max subscriptions, this won't matter. For teams running heavy API workloads, budget accordingly.

Key Features

High-Resolution Image Support

This is the first Claude model to support high-resolution image input, bumping the max from 1568px / 1.15MP on Opus 4.6 to 2576px / 3.75MP. That's more than triple the pixel count.

Why it matters: if you've ever asked Claude to read a screenshot, analyze a diagram, or extract data from a chart, you've probably hit the old resolution ceiling. We found Opus 4.7 significantly more reliable at reading dense UIs, architectural diagrams, and code screenshots without needing to crop or resize first. The CharXiv visual reasoning benchmark reflects this, jumping from 69.1% to 82.1%.

Task Budgets

Opus 4.7 introduces task budgets, a system that gives the model a rough token budget for agentic loops. The model gets a running countdown so it can prioritize remaining work and finish gracefully instead of cutting off mid-task.

This is particularly useful in Claude Code, where agentic sessions can run long. Instead of the model burning through tokens on early steps and then running out of room, it can pace itself. In our testing, long multi-file refactors completed more reliably with task budgets active.

xhigh Effort Level

There's a new reasoning effort parameter called xhigh that sits above the existing high setting. It tells the model to spend more compute on intermediate reasoning steps, which improves accuracy on complex tasks at the cost of higher latency.

We tested this on multi-step debugging tasks and saw consistent improvements on problems that required holding several files in context simultaneously. For quick questions, stick with the default. For hard problems, xhigh earns its keep.

Thinking Content Changes

Thinking blocks (the model's internal reasoning chain) now appear in the stream but the content is omitted by default. You have to explicitly opt in to see them. This trims some latency off responses. If you rely on thinking content for debugging or transparency, just flip the parameter back on.

What We Liked

Coding performance is genuinely best-in-class. SWE-bench Verified hit 87.6% (up from 80.8%), and the SWE-bench Pro score of 64.3% leads the industry by a comfortable margin over GPT-5.4's 57.7%.

Tool use is noticeably sharper. MCP-Atlas tool invocation scored 77.3%, leading GPT-5.4 by 9.2 points. In practice, we found Opus 4.7 more reliable at chaining multiple tool calls without losing track.

High-res image input is a real upgrade. Reading dense screenshots and technical diagrams went from frustrating to functional. This matters for developer and design workflows.

Task budgets make agentic work more predictable. Long sessions in Claude Code felt more controlled, with the model wrapping up cleanly instead of abruptly hitting walls.

Scientific reasoning remains strong. GPQA Diamond at 94.2% means you can trust it with technical and scientific questions.

The /ultrareview feature in Claude Code adds a genuinely useful multi-agent code review workflow that catches issues a single pass would miss.

What Could Be Better

The new tokenizer is a hidden cost increase. Producing up to 35% more tokens for the same input text means your API bills go up even though the per-token price didn't change. Anthropic should be more upfront about this.

Web search still lags behind GPT-5.4. BrowseComp scores tell the story: 79.3% vs GPT-5.4's 89.3%. If your workflow depends heavily on web-grounded answers, GPT-5.4 remains the better pick for that specific task.

xhigh effort adds real latency. On complex prompts, responses noticeably slow down. It's worth it for hard problems, but you wouldn't want it on by default.

Who Is It Best For?

Software developers and engineering teams. The SWE-bench scores aren't abstract. Opus 4.7 handles multi-file refactors, complex debugging, and code generation better than any other model we've tested. The Claude Code integration with task budgets and /ultrareview makes this especially true.

AI-powered product builders. If you're building applications that rely on tool use and agentic loops, the MCP-Atlas improvements and task budget system directly improve reliability.

Teams working with visual content. The high-res image support opens up workflows around screenshot analysis, diagram interpretation, and document extraction that previously required workarounds.

Researchers and analysts. The 1M context window combined with a 94.2% GPQA Diamond score means it handles dense, technical material well.

Who should wait? If you primarily use Claude for writing, brainstorming, or casual conversation, the improvements in Opus 4.7 are marginal over 4.6. And if web search accuracy is critical, GPT-5.4 is still the better tool for that.

Alternatives Worth Considering

GPT-5.4

OpenAI's current flagship. It leads Claude in web search (BrowseComp: 89.3% vs 79.3%) and remains highly competitive across general reasoning tasks. However, it trails Opus 4.7 in coding (SWE-bench Pro: 57.7% vs 64.3%) and tool use (MCP-Atlas: 68.1% vs 77.3%).

Gemini 3.1 Pro

Google's latest sits behind both Claude and GPT in coding benchmarks (SWE-bench Verified: 80.6%) but remains competitive in multimodal tasks and benefits from tight Google Workspace integration. Pricing is also competitive.

Meta Muse Spark

A newer entrant focused on creative tasks. Not a direct competitor for coding or agentic work, but worth watching if your primary use case is content generation.

Final Verdict

Claude Opus 4.7 is the best model available for coding, tool use, and agentic AI workflows. The benchmark numbers back that up across SWE-bench, MCP-Atlas, and GPQA Diamond, and the practical improvements. particularly task budgets and high-res image support. translate into real workflow gains.

The honest caveat is cost. The new tokenizer means you're paying more for the same work, even if the price sheet looks identical. For API-heavy teams, run the numbers before assuming your budget stays flat.

At $20/month on Claude Pro, it's easy to recommend trying it. At API scale, the value depends on whether coding and agentic performance matter more to you than the tokenizer tax.

Our rating: 8.5/10. A strong, substantive upgrade for developers and AI builders. Not a must-upgrade for everyone else.