The AI Enthusiast · Weekly Briefing · Issue No. 11 Apr 17 – Apr 24, 2026

A week of model drops, a decision not to release one, and the agent stack gets serious.

Anthropic shipped Opus 4.7 and withheld Mythos. OpenAI pushed GPT‑5.5. Alibaba and DeepSeek turned up the open‑weight pressure. The through‑line this week: long‑horizon agents, coding, and the operational scaffolding around them.

// The short read
§ 01 · Anthropic

Opus 4.7 ships, Mythos doesn't, and the agent platform gets a name

Model release

Claude Opus 4.7

Anthropic released claude-opus-4-7 across all Claude products and the API. The update emphasizes stronger coding, better performance on long‑running software tasks, and higher‑resolution vision. Pricing holds steady at $5 / $25 per million input/output tokens — same as Opus 4.6.

Tier: all products + APIPricing: unchangedFocus: agentic coding · vision

Safety decision

Mythos Preview stays behind the glass

Anthropic built — and chose not to broadly release — Claude Mythos Preview. Citing cyber risk (notably the model's ability to identify vulnerabilities in software), Anthropic is restricting access to a limited group of cybersecurity partners via Project Glasswing.

This is the clearest signal to date that frontier labs will publicly withhold capabilities on security grounds — and name the program.

Platform

Managed Agents

A new hosted service on the Claude Platform designed for long‑horizon agent work. It provides stable interfaces for sessions, harnesses, and sandboxes, with durable state and safer tool access — aimed at teams running agents beyond a single turn.

Use case: long‑horizon loopsPrimitives: sessions · harnesses · sandboxes

Product

Claude Design & in‑line visuals

Claude Design is a new experimental surface for generating prototypes, slides, and one‑pagers. Alongside it: Claude can now render custom charts and diagrams in‑line in responses, and the mobile app can display fully interactive, shareable assets in chat.

Developer

Claude Code: native binary, tighter sandboxing

Claude Code shipped a broad CLI update with a native binary launcher, stronger sandbox and permission safeguards, and smoother workflows. Separately, there was some noise over Claude Code pricing on the $20 Pro tier — Anthropic clarified it is not moving to a flat $100/month; see Simon Willison's write‑up for the plain‑English version.

CLI: native binarySandbox: stronger defaultsPricing: unchanged for Pro

▲ Worth noting

Anthropic also reaffirmed that Claude will remain ad‑free, arguing advertising incentives are incompatible with a genuinely helpful assistant. Read that as a positioning shot across the bow.

§ 02 · OpenAI

GPT‑5.5 and the “Codex for everything” push

Model release

GPT‑5.5 — OpenAI's “smartest, most intuitive” model

OpenAI released GPT‑5.5 to paid tiers (Plus, Pro, Business, Enterprise) in ChatGPT and Codex. It's positioned as an agentic model designed to work through complex tasks by switching between multiple tools — better at coding, computer use, and deep research. It arrives under two months after GPT‑5.4, underscoring the current pace of frontier releases.

Tier: Plus · Pro · Business · EnterpriseSurface: ChatGPT + CodexPitch: agentic tool‑switching

Agents

Codex for (almost) everything

OpenAI expanded Codex from “code assistant” toward a general agentic coder across surfaces. Paired with GPT‑5.5 two days later, this sets up a narrative around ChatGPT as a multi‑tool super app — a framing TechCrunch and others picked up immediately.

§ 03 · Open weights

Qwen and DeepSeek keep the pressure on closed labs

Qwen

Qwen 3.6‑Max‑Preview

Alibaba released its most powerful model to date. Qwen 3.6‑Max‑Preview tops six major coding benchmarks and posts gains in world knowledge and instruction following over its predecessor. Max‑Preview is proprietary; the wider Qwen 3.6 family includes open‑weight variants (Apache 2.0 for the 35B‑A3B model).

Coding: SOTA on 6 benchmarksLicense: proprietary (Max‑Preview)

Qwen · open weights

Qwen 3.6‑27B — a dense model that beats a 397B MoE

Alibaba's Qwen team dropped a 27B‑parameter dense open‑weight model that outperforms a 397B MoE on agentic coding benchmarks. It's a pointed argument that architecture and training quality still beat parameter count — and an easier deploy target for teams that can't serve a trillion‑parameter router.

DeepSeek

DeepSeek V4 Preview — 1.6T MoE, 1M context, Apache 2.0

DeepSeek launched the V4 Preview as two open‑weight MoE models: V4‑Pro (1.6T total / 49B active) and V4‑Flash (284B total / 13B active). Both ship with 1M‑token context. V4‑Pro leads all open rivals on math and coding, and trails only Gemini 3.1‑Pro on world knowledge. Available on Hugging Face, the DeepSeek API, and chat.deepseek.com (Expert = Pro, Instant = Flash).

License: Apache 2.0Context: 1,000,000Thesis: context is an efficiency problem, not a capability one

Model Released Params License Notes
Claude Opus 4.7 Apr 22 closed Coding + long‑horizon + vision; $5/$25 per M
GPT‑5.5 Apr 23 closed Agentic tool switching; Plus / Pro / Biz / Ent
Qwen 3.6‑Max‑Preview Apr 20 preview SOTA on 6 coding benchmarks
Qwen 3.6‑27B Apr 22 27B dense open Beats 397B MoE on agentic coding
DeepSeek V4‑Pro Apr 24 1.6T / 49B active Apache 2.0 1M context; math/coding leader among open models
DeepSeek V4‑Flash Apr 24 284B / 13B active Apache 2.0 1M context; the cheap‑and‑fast tier
§ 04 · Field note · one‑pager

“The harness is the product” — Nate B. Jones on OpenAI vs. Anthropic

Deep Dive

Two labs agree on the shape of the agent era. They disagree on who pays for the runtime.

Jones's thesis, from his breakdown of this week's Codex push: the real product isn't the model anymore — it's the harness around it (sessions, sandboxes, tool access, computer use). OpenAI and Anthropic are converging on the same architecture and splitting sharply on the business model. One sells the runtime. The other gives it away and taxes the tokens.

// Anthropic

Runtime‑as‑a‑product
  • OfferingManaged Agents — hosted harness on the Claude Platform
  • Pricing~$0.08 per session‑hour, billed separately from model tokens
  • PrimitivesSessions · sandboxes · stable tool access · durable state
  • BetOperating long‑horizon agents is hard; customers will pay for reliability

// OpenAI

Runtime‑as‑a‑loss‑leader
  • OfferingModel‑native harness shipped as an update to the open‑source Agents SDK
  • PricingNo first‑party runtime fee — pay only for model + tool calls
  • WedgeCodex expanding into an agentic computer‑use layer across surfaces
  • BetRuntime is commodity; lock‑in lives in the model and the tool meter
▲ What it means for builders
  1. Pick your tax. A per‑session runtime bill (Anthropic) vs. higher effective token/tool cost under an open SDK (OpenAI). Model it against your actual agent duty cycle, not demo traffic.
  2. Portability is real now. If OpenAI's runtime is open source and Anthropic's is hosted, your agent's harness and your model choice are two separate procurement decisions. Design for that.
  3. Computer‑use is the battlefield. Both sides are optimizing the harness for long, messy, tool‑switching tasks — not chat. Evaluate on agent traces, not leaderboard scores.
◆ The deeper split · design philosophy, not just pricing

Jones's second — and sharper — point: the two labs disagree on how an agent should reach into the world. It's not a pricing fight; it's a trust model.

// OpenAI · “Act like a human”

Pixels, cursor, keyboard
  • MechanismCodex drives the actual desktop — clicks, types, reads the screen like a user would. No special integration needed.
  • SurfaceAny app on your Mac. If a human can do it with a mouse, Codex can too.
  • UpsideBreadth. Works with legacy apps that will never ship an API — CAD, finance terminals, niche internal tools.
  • DownsideNon‑deterministic, hard to audit, slow. A hallucinated click is a real click.

// Anthropic · “Call a typed tool”

MCP, schemas, structured I/O
  • MechanismEvery capability is a named function with a schema. MCP is the open protocol; the model never touches pixels.
  • SurfaceAnything that exposes an MCP server — databases, CRMs, filesystems, your own services.
  • UpsideDeterministic, auditable, typed. Every agent action is a logged, structured call.
  • DownsideCoverage gap. No MCP server = no access. And the protocol itself is earning scars — an OX Security RCE report this month hit the reference SDKs across Python, TypeScript, Java, and Rust.

The rough translation of Jones's take: OpenAI bets the model is smart enough to operate human interfaces safely. Anthropic bets the interfaces themselves should be redesigned for models — structured, permissioned, intermediated — because the blast radius of a wrong click in a general agent loop is too high.

▲ How to read it
  1. Pick by workflow, not vendor loyalty. Greenfield internal stack with APIs you control → MCP (lower variance, cleaner traces). Legacy desktop apps, vendor software, browser workflows you don't own → computer‑use (only option that actually reaches them).
  2. The two are converging, not competing. Anthropic already ships a computer‑use mode; OpenAI will absorb MCP where it's easier than UI automation. Expect production agents to use both surfaces in the same loop within 12 months.
  3. Security follows the philosophy. MCP's ongoing vulnerability story (and Anthropic's “expected behavior” response) is not a bug — it's the cost of shipping a protocol. Computer‑use's risk surface is bigger but lives inside the OS sandbox you already audit.
§ 05 · Developer deep dive

Migrating to Opus 4.7: your old prompts are quietly underperforming

Prompt Migration

4.7 takes your instructions literally. 4.6 generalized them generously.

The biggest behavior change in Opus 4.7 isn't a benchmark number — it's that the model no longer silently fills in what you meant. If your Opus 4.6 prompt worked partly because the model inferred follow‑on intent, that intent now needs to be written down. Hedges like “try to,” “if possible,” and “you might want to” used to read as firm guidance. In 4.7 they read as hedges — and the model obliges.

// Before (Opus 4.6)

Permissive · inferential
  • Prompt style“Try to add tests if possible.”
  • ResultModel generally adds tests — reads intent loosely.
  • Samplingtemperature, top_p, top_k accepted.
  • ThinkingSummarized reasoning returned by default.
  • TokenizerLegacy — budget sized against old counts.

// After (Opus 4.7)

Literal · explicit
  • Prompt style“Add tests. Cover the happy path and two edge cases.”
  • ResultModel does exactly that — and only that.
  • SamplingNon‑default temperature/top_p/top_k400. Remove them; shape behavior in the prompt.
  • ThinkingEmpty by default. Set thinking.display: "summarized" to restore visibility.
  • TokenizerNew — expect 1.0×–1.35× more tokens for the same text. Re‑budget.
▲ Migration checklist
  1. Strip the hedges. Global find for “try to,” “if possible,” “you might want to,” “consider,” “where appropriate.” Replace with imperatives and acceptance criteria.
  2. Promote implicit context. Anything you relied on 4.6 to infer — tests, error handling, style conventions, follow‑up actions — write it down as a bullet in the prompt.
  3. Remove sampling knobs. Delete temperature, top_p, top_k from API calls. 4.7 rejects non‑default values with a 400. Use the prompt to steer creativity vs. precision instead.
  4. Re‑enable thinking if you used it. Streams still emit thinking blocks, but the thinking field is empty unless you opt in. Set thinking.display: "summarized".
  5. Re‑size your token budget. New tokenizer uses up to 1.35× more tokens per string. Revisit context windows, rate limits, and cost models — not benchmarks.
  6. Turn up effort. Anthropic's own guidance: start xhigh for coding and agentic work; high minimum for intelligence‑sensitive tasks. Defaults underfit 4.7's reasoning.
  7. Watch the calendar. claude-opus-4-6 is deprecated June 15, 2026. After that, API calls either fail or redirect to the nearest model — don't let your migration be involuntary.
§ 06 · Product deep dive

Claude Design: when to reach for it, and when not to

Product Guide

Idea → shareable visual, without opening Figma.

Claude Design is an Anthropic Labs surface that turns a prompt into a polished prototype, slide deck, wireframe, or one‑pager — on‑brand, interactive, and shareable. Powered by Opus 4.7, available in research preview to Claude Pro, Max, Team, and Enterprise. It is not trying to replace Figma for dedicated designers. It's aimed at everyone who currently blocks a designer for twenty minutes to get a mockup, or fakes it in Keynote.

// Reach for it when

Speed beats fidelity
  • DesignersTurn static mockups into interactive, test‑able prototypes — no PR, no code review.
  • PMsSketch a feature flow, hand it to Claude Code for implementation, or to a designer to polish.
  • Founders / AEsRough outline → complete, on‑brand deck in minutes. Export as .pptx or send to Canva.
  • MarketingOne‑pagers, campaign visuals, pitch materials that match your design system automatically.

// Skip it when

Fidelity beats speed
  • Pixel‑perfect workShipping production UI with subpixel precision — stay in Figma.
  • Complex component librariesMulti‑state, multi‑variant component authoring still belongs in a real design tool.
  • Heavy illustrationCustom illustration, motion, and brand system authoring — not the target use case.
  • Regulated outputAnything requiring sign‑off, version control, and accessibility auditing before it ships.
▲ How the workflow actually runs
  1. Onboard against your system. During setup, Claude reads your codebase and design files and builds a design system — colors, typography, components. Every project after that inherits it automatically.
  2. Describe, don't draw. “A three‑column pricing page, tier badges, annual toggle, our brand voice.” Claude returns a first pass.
  3. Refine in place. Comment inline on specific elements, edit text directly, or use adjustment knobs for spacing, color, and layout. Then ask Claude to apply a change across the full design.
  4. Hand off. Export to PPTX, push to Canva, or pass the prototype spec to Claude Code for a real implementation.
▲ Strategic read

Claude Design is the first Anthropic product that treats design as an outcome, not a tool category. It's a direct bet that non‑designers generate 80% of the visual artifacts inside a company — and that the current Figma‑shaped hole in their day is worth a product. Figma's response will tell you whether “design tool” survives as a distinct SaaS category or collapses into the chat window.

§ 07 · Quick hits

Also on the radar

Google

Gemini in Chrome expands to APAC

Available on desktop and iOS in Australia, Indonesia, Japan, the Philippines, Singapore, South Korea, and Vietnam (iOS excluded in Japan). The agentic browser‑control feature remains US‑only on AI Pro / Ultra.

Enterprise

Deloitte + Gemini Enterprise

At Cloud Next '26 (Apr 22), Deloitte announced a dedicated agentic transformation practice built on Gemini Enterprise — a sign of the big consultancies aligning around Google's agent stack.

Security

Mythos drives the cyber conversation

Anthropic's decision to hold Mythos has become a reference point for government and industry discussions on dual‑use frontier models — expect policy language to cite it within weeks.

Infra

Data Center World, Apr 20–23

The week's ambient backdrop: infrastructure leaders gathering to talk power, cooling, and capacity — the physical tax on every model above.

▲ Pattern of the week

The agent stack is becoming a product category. Managed Agents (Anthropic), Codex‑for‑everything (OpenAI), Qwen's dense agentic coder, Gemini's browser agent, Deloitte's Gemini practice — five vendors, same framing. Expect the 2026 developer conversation to shift from “which model” to “which agent runtime.”