
TL;DR
Rating: 8.5/10. Claude Opus 4.7 is the strongest coding and agentic AI model available right now, with meaningful jumps in SWE-bench scores and a new high-resolution image mode that actually matters for developer workflows. The catch: a new tokenizer quietly inflates your costs by up to 35%, even though the sticker price hasn't changed. If you're already on Opus 4.6 and use it heavily for code or tool-driven tasks, upgrading is a no-brainer. If you're mostly chatting or writing, the improvements are harder to feel.
What Is Claude Opus 4.7?
Claude Opus 4.7 is Anthropic's latest flagship AI model, released on April 16, 2026. It sits at the top of the Claude model lineup, designed for complex reasoning, long-context work, coding, and agentic tasks where the model calls tools and makes decisions across multiple steps.
It's available through the Claude API, Amazon Bedrock, Google Cloud Vertex AI, and Microsoft Foundry. The model keeps the same 1M token context window and 128K max output tokens as its predecessor, Opus 4.6.
The headline story here is coding performance. Opus 4.7 now holds the highest score ever recorded on SWE-bench Pro at 64.3%, putting it meaningfully ahead of both GPT-5.4 and its own predecessor. But there are several other changes worth unpacking.
Pricing and Plans
| Plan / Access | Price | Notes |
|---|---|---|
| API Input | $5 / 1M tokens | Unchanged from Opus 4.6 |
| API Output | $25 / 1M tokens | Unchanged from Opus 4.6 |
| Prompt Caching | Up to 90% savings | For repeated context |
| Batch Processing | Up to 50% savings | For async workloads |
| Claude Pro | $20/mo | Includes Opus 4.7 access |
| Claude Max | $100/mo or $200/mo | Higher usage limits |
Here's the thing nobody's putting in the headline: Opus 4.7 ships with a new tokenizer. The same input text now produces up to 35% more tokens. So while Anthropic technically didn't raise prices, your effective cost per query can be noticeably higher. For light users on Pro or Max subscriptions, this won't matter. For teams running heavy API workloads, budget accordingly.
Key Features
High-Resolution Image Support
This is the first Claude model to support high-resolution image input, bumping the max from 1568px / 1.15MP on Opus 4.6 to 2576px / 3.75MP. That's more than triple the pixel count.
Why it matters: if you've ever asked Claude to read a screenshot, analyze a diagram, or extract data from a chart, you've probably hit the old resolution ceiling. We found Opus 4.7 significantly more reliable at reading dense UIs, architectural diagrams, and code screenshots without needing to crop or resize first. The CharXiv visual reasoning benchmark reflects this, jumping from 69.1% to 82.1%.
Task Budgets
Opus 4.7 introduces task budgets, a system that gives the model a rough token budget for agentic loops. The model gets a running countdown so it can prioritize remaining work and finish gracefully instead of cutting off mid-task.
This is particularly useful in Claude Code, where agentic sessions can run long. Instead of the model burning through tokens on early steps and then running out of room, it can pace itself. In our testing, long multi-file refactors completed more reliably with task budgets active.
xhigh Effort Level
There's a new reasoning effort parameter called xhigh that sits above the existing high setting. It tells the model to spend more compute on intermediate reasoning steps, which improves accuracy on complex tasks at the cost of higher latency.
We tested this on multi-step debugging tasks and saw consistent improvements on problems that required holding several files in context simultaneously. For quick questions, stick with the default. For hard problems, xhigh earns its keep.
Thinking Content Changes
Thinking blocks (the model's internal reasoning chain) now appear in the stream but the content is omitted by default. You have to explicitly opt in to see them. This trims some latency off responses. If you rely on thinking content for debugging or transparency, just flip the parameter back on.
What We Liked
Coding performance is genuinely best-in-class. SWE-bench Verified hit 87.6% (up from 80.8%), and the SWE-bench Pro score of 64.3% leads the industry by a comfortable margin over GPT-5.4's 57.7%.
Tool use is noticeably sharper. MCP-Atlas tool invocation scored 77.3%, leading GPT-5.4 by 9.2 points. In practice, we found Opus 4.7 more reliable at chaining multiple tool calls without losing track.
High-res image input is a real upgrade. Reading dense screenshots and technical diagrams went from frustrating to functional. This matters for developer and design workflows.
Task budgets make agentic work more predictable. Long sessions in Claude Code felt more controlled, with the model wrapping up cleanly instead of abruptly hitting walls.
Scientific reasoning remains strong. GPQA Diamond at 94.2% means you can trust it with technical and scientific questions.
The /ultrareview feature in Claude Code adds a genuinely useful multi-agent code review workflow that catches issues a single pass would miss.
What Could Be Better
The new tokenizer is a hidden cost increase. Producing up to 35% more tokens for the same input text means your API bills go up even though the per-token price didn't change. Anthropic should be more upfront about this.
Web search still lags behind GPT-5.4. BrowseComp scores tell the story: 79.3% vs GPT-5.4's 89.3%. If your workflow depends heavily on web-grounded answers, GPT-5.4 remains the better pick for that specific task.
xhigh effort adds real latency. On complex prompts, responses noticeably slow down. It's worth it for hard problems, but you wouldn't want it on by default.
Who Is It Best For?
Software developers and engineering teams. The SWE-bench scores aren't abstract. Opus 4.7 handles multi-file refactors, complex debugging, and code generation better than any other model we've tested. The Claude Code integration with task budgets and /ultrareview makes this especially true.
AI-powered product builders. If you're building applications that rely on tool use and agentic loops, the MCP-Atlas improvements and task budget system directly improve reliability.
Teams working with visual content. The high-res image support opens up workflows around screenshot analysis, diagram interpretation, and document extraction that previously required workarounds.
Researchers and analysts. The 1M context window combined with a 94.2% GPQA Diamond score means it handles dense, technical material well.
Who should wait? If you primarily use Claude for writing, brainstorming, or casual conversation, the improvements in Opus 4.7 are marginal over 4.6. And if web search accuracy is critical, GPT-5.4 is still the better tool for that.
Alternatives Worth Considering
GPT-5.4
OpenAI's current flagship. It leads Claude in web search (BrowseComp: 89.3% vs 79.3%) and remains highly competitive across general reasoning tasks. However, it trails Opus 4.7 in coding (SWE-bench Pro: 57.7% vs 64.3%) and tool use (MCP-Atlas: 68.1% vs 77.3%).
Gemini 3.1 Pro
Google's latest sits behind both Claude and GPT in coding benchmarks (SWE-bench Verified: 80.6%) but remains competitive in multimodal tasks and benefits from tight Google Workspace integration. Pricing is also competitive.
Meta Muse Spark
A newer entrant focused on creative tasks. Not a direct competitor for coding or agentic work, but worth watching if your primary use case is content generation.
Final Verdict
Claude Opus 4.7 is the best model available for coding, tool use, and agentic AI workflows. The benchmark numbers back that up across SWE-bench, MCP-Atlas, and GPQA Diamond, and the practical improvements. particularly task budgets and high-res image support. translate into real workflow gains.
The honest caveat is cost. The new tokenizer means you're paying more for the same work, even if the price sheet looks identical. For API-heavy teams, run the numbers before assuming your budget stays flat.
At $20/month on Claude Pro, it's easy to recommend trying it. At API scale, the value depends on whether coding and agentic performance matter more to you than the tokenizer tax.
Our rating: 8.5/10. A strong, substantive upgrade for developers and AI builders. Not a must-upgrade for everyone else.
Read more

AI Layoffs in 2026: 78,000 Jobs Cut in Q1. What's Real and What's AI Washing
78,557 tech workers lost their jobs in Q1 2026 with 48% blamed on AI. But 59% of hiring managers admit their companies overstate AI's role. We separate fact from narrative.

Microsoft MAI Models vs OpenAI: What the $13B Breakup Means for Developers
Microsoft released three proprietary MAI models, signaling a clear break from its $13B OpenAI partnership. We break down the models, the strategy, and what developers should do.

Best AI Code Editors in 2026 (Tested & Ranked)
We tested Cursor, Claude Code, GitHub Copilot, Windsurf, and PearAI on real projects. Here's which AI code editor is actually worth your money in 2026.
