The AI Enthusiast — Claude Leaks: Lessons & What's Next · Issue No. 10

01 — Timeline

Two Weeks That Changed the AI Landscape

From an accidental source map to a $100M security initiative, the last two weeks of March–April 2026 revealed more about Claude's architecture than any official announcement.

March 11, 2026

Bun Bug Reported

A known Bun issue surfaces: source maps are served in production builds even when docs say they shouldn't be. The bug sits open for 20 days.

March 24, 2026

Claude Cowork Ships

Anthropic launches Claude Computer Use for macOS — a desktop agent that can control apps, navigate browsers, and handle multi-step workflows.

March 31, 2026

The Leak

Claude Code npm package v2.1.88 ships with a 59.8 MB source map. Within hours: 1,906 files, 512,000+ lines of TypeScript are public. GitHub mirrors fork tens of thousands of times.

April 1–3, 2026

Containment Fails

Anthropic issues mass GitHub takedowns, later admits they impacted more repos than intended. Rewrites and ports in Python and Rust keep the architecture public.

April 7, 2026

Project Glasswing Announced

Anthropic reveals Claude Mythos Preview — a restricted frontier model given to 50+ organizations for defensive cybersecurity. $100M in credits committed.

April 8, 2026

Managed Agents Launch

Claude Managed Agents enters public beta: cloud-hosted agent infrastructure with sandboxed execution, checkpointing, and scoped permissions.

02 — The Leak

What Was Actually Exposed

The source map wasn't just code — it was the most detailed blueprint of a production AI agent system ever made public.

1,906

Files exposed

512K+

Lines of TypeScript

59.8 MB

Source map file

46K

Lines in QueryEngine alone

⚙

Architecture Exposed

QueryEngine (46K lines) — streaming, tool-call loops, thinking mode, token counting, permission wrapping

Tool System (29K lines) — 50+ tools across file ops, shell, agents, web, MCP, scheduling

Command System (25K lines) — CLI parsing, slash commands, hooks, feature flags

🔒

Security Details Exposed

Permission modes — six levels from default to "full bypass" that auto-approves all operations

System prompt assembly — hardcoded guardrails + CLAUDE.md + git context

Hook implementations — exact pre/post execution logic

Known CVEs — CVE-2025-59536 and CVE-2026-21852 now easier to weaponize

Key insight: The leak proved that the competitive moat is not the model — it's the orchestration, permissions, and tool routing wrapped around it. The harness is the product.

03 — Under the Hood

The Agent Architecture Blueprint

What the leak revealed is a five-layer agent execution system that goes far beyond a chatbot interface.

Entrypoint

→

Query Engine

→

Tool Registry

→

Execution

→

Verification

Technology Stack

Layer	Technology	Purpose
Runtime	`Bun`	Fast JS runtime for agent execution
Language	`TypeScript`	Full type safety across 512K lines
UI	`React + Ink`	Terminal UI rendering
Validation	`Zod`	Schema validation for tool inputs/outputs
Auth	`OAuth 2.0 + JWT`	Authentication + macOS Keychain
Telemetry	`OpenTelemetry`	Tracing, metrics, user frustration tracking

Notable finding: The codebase includes frustration detection — Claude tracks signals of user frustration to adjust its behavior. This was flagged by Scientific American as a privacy concern.

04 — Impact

What the Leak Means in Practice

The impact extends across security, competition, and the broader AI ecosystem.

⚠

Security

Pre-existing CVEs now far easier to exploit. Permission bypass logic is public. Attackers can craft targeted malicious repos that abuse previously unknown entry points.

⚖

Competition

Every AI lab now has a detailed reference implementation. The orchestration patterns, tool routing, and memory architecture are no longer trade secrets. Open-source ports appeared within days.

🌱

Ecosystem

Accelerated adoption and understanding. Developers now build on known architecture instead of guessing. Public scrutiny likely accelerates security patching. Community engagement surged.

19M

Views on initial X post

46K+

GitHub stars on mirrors

3+

Language ports (Python, Rust)

2

Known CVEs now exposed

Brand reality check: Two accidental code exposures within five days undermine the "safety-first" narrative central to Anthropic's market positioning. Operational competence is now part of the trust equation.

05 — Project Glasswing

Claude Mythos & Defensive Security

Days after the leak, Anthropic announced its most ambitious security initiative. Coincidence or crisis response — the move is significant either way.

🛡

What Is Mythos Preview

An unreleased frontier model restricted to defensive security work. Not publicly available — given to selected organizations to find and fix vulnerabilities before broader release.

Mythos autonomously found a 17-year-old RCE vulnerability in FreeBSD (CVE-2026-4747) with no human involvement after the initial request.

Anthropic says Mythos identified thousands of zero-day vulnerabilities across every major OS and browser.

🌐

Project Glasswing Partners

50+ organizations with access, including:

AWS Apple Microsoft Google NVIDIA CrowdStrike Palo Alto Networks Cisco JPMorganChase Linux Foundation Broadcom

$100M+ in usage credits committed, plus $4M in donations to open-source security organizations.

The signal: Anthropic is positioning AI models as security infrastructure, not just productivity tools. The restricted release model — defend first, release later — is new for the industry and sets a precedent.

06 — Managed Agents

From Prototype to Production in Days

Announced April 8, 2026, Claude Managed Agents is Anthropic's answer to the hardest part of building AI agents: the infrastructure.

What Managed Agents Handles for You

Sandboxed execution — isolated container per agent
Checkpointing — resume after failures
Credential management — secure secrets handling

Scoped permissions — control tool access
End-to-end tracing — full observability
Error recovery — auto-resume after outages

💰

Pricing

Model usage cost + $0.08 per agent runtime hour. That's the infrastructure premium for not building your own sandboxing, state management, and credential storage.

🏢

Early Adopters

Notion Rakuten Asana

Teams using Managed Agents across coding, task automation, and document processing workflows.

Research Preview Features

Multi-agent spawning — Complex tasks split across specialized sub-agents
Auto prompt refinement — Improved task success by up to 10 points in internal testing

Developer takeaway: If the leak showed the harness is the moat, Managed Agents is Anthropic selling access to that harness. You get the orchestration without building it.

07 — The Platform Wave

Claude Is Becoming an Operating Layer

Zoom out and the picture is clear: Anthropic shipped five major capabilities in six weeks. This is a platform play, not incremental updates.

Date	Release	What It Means
Feb 5	Opus 4.6	1M context window, agent teams, 300K max output tokens on Batches API
Feb 17	Sonnet 4.6	Same-price upgrade, matching 1M context
Mar 24	Claude Cowork	Desktop agent: controls Mac apps, navigates browsers, handles multi-step tasks autonomously
Mar 31	The Leak	Full agent architecture becomes public knowledge
Apr 7	Project Glasswing	Mythos Preview for defensive security with 50+ partner organizations
Apr 8	Managed Agents	Cloud-hosted agent infrastructure as a service

Model
Opus / Sonnet 4.6

→

Desktop
Cowork + Dispatch

→

Cloud
Managed Agents

→

Security
Mythos / Glasswing

The pattern: Claude isn't improving as a chatbot. It's expanding into a runtime — a layer between you and your computer, your cloud, your infrastructure, and your security posture.

08 — Internal Signals

Codenames and What's Likely Coming

The leak and subsequent analysis surfaced internal labels and feature flags. Some are confirmed, some remain speculative.

✅

Confirmed or Shipped

Mythos Preview — Active in Project Glasswing, named partners, live security work

Managed Agents — Public beta with API, pricing, and early adopters

Cowork / Computer Use — GA for macOS, Windows coming

🔬

Unconfirmed / Speculative

Capybara Fennec Numbat Tengu
Internal model family codenames — likely tiered development tracks

Kairos — Background/always-on execution
AutoDream — Memory consolidation / overnight synthesis
UltraPlan — Deeper multi-step planning mode

Reality check: Feature flags are not launch promises. Internal experiments may never ship. Treat codenames as directional signals, not product roadmap commitments.

09 — The Real Moat

Conway: The Layer Above MCP

The leak revealed something most analysis missed: Anthropic is building a proprietary orchestration layer on top of the open MCP standard. This is the classic platform play — and it's called Conway.

The Two-Layer Strategy

Open Layer

MCP (Model Context Protocol)

Open standard donated to Linux Foundation
Adopted by OpenAI, Google, and others
Portable tool connectors across platforms
Standardized AI-to-tool communication

Purpose: Create adoption. Build the ecosystem.

Proprietary Layer

Conway (CNW)

Always-on agent runtime with persistent memory
Custom extensions in .cnw.zip format
Webhook triggers, event streams, scheduling
UI panels: Search, Chat, System controls

Purpose: Create lock-in. Capture value.

MCP
Open • Portable

→

Conway
Proprietary • Persistent

→

CNW Extensions
App Store • Locked In

🔒

The Behavioral Lock-In Problem

The deepest moat isn't data portability — it's behavioral lock-in. Once Conway runs continuously for months, it accumulates decision patterns, workflow preferences, inferred business rules, and edge cases it's learned to handle.

There is no export format for an agent's learned operational intelligence. No regulatory framework for migrating it. Switching means retraining — potentially months of ramp-up cost.

This is the moat. Not the model weights. Not MCP. The layer between the open standard and your daily operations.

Think of it like this: MCP is like USB — a universal connector anyone can use. Conway is like iOS — the operating system that makes USB useful, but locks you into Apple's ecosystem. The connector is open. The experience layer is not.

10a — What This Means to You

Perspectives: The New User & the Executive

The same events read very differently depending on where you sit.

🌱

If You're New to AI

The barrier to entry just dropped. The leak means the architecture of how AI agents work is no longer a mystery. Learning materials exploded — the blueprint is public.
Start with Claude Code + CLAUDE.md. Write a project description file, install Claude Code, and start asking it to help with real work.
Expect cost changes. The OpenClaw ban shows flat-rate AI access is unsustainable. Budget for API costs or stay within official tool limits.
Pick a lane. Open-source tools (Aider, Cline, OpenCode) vs proprietary platforms (Claude Code, Conway). Your choice now shapes switching costs later.

🏢

If You're a CIO or CEO

Vendor trust just got more complex. Evaluate AI vendors on operational competence, not just model capability.
The Conway lock-in is real. Agents that run 24/7 accumulate institutional knowledge that doesn't port. Insist on platform-independent documentation.
Managed Agents changes the build-vs-buy math. At $0.08/hour, many internal agent platforms no longer justify their engineering cost.
Token economics hit the P&L directly. Model routing can cut AI costs 60–80%. This needs to be a line-item strategy.

10b — What This Means to You

Perspectives: The Developer & the Founder

For builders and founders, the last two weeks reshaped both the opportunity and the risk landscape.

💻

If You're a Developer

The leaked architecture is your study guide. Five-layer agent systems, tool routing, permission models, memory management — the reference implementation is public.
Build on MCP, be cautious with CNW. MCP integrations are portable. Conway extensions are not. Invest in the open layer first.
Skills and hooks are the productivity multiplier. Not the model. Not the prompt. The system you build around the model compounds over time.
Understand the cost model. A single OpenClaw instance can burn $1K–$5K/day. Use the Batch API (50% off) for non-urgent work. Route intelligently.

🚀

If You're a Startup Founder

The "wrapper" startup is dead. The "harness" startup is alive. The value is in orchestration, permissions, domain-specific workflows, and the memory layer.
Managed Agents is both opportunity and threat. Easier to launch agent-powered products, but your infra moat just evaporated. Differentiate on domain knowledge.
Watch the CNW extension ecosystem. Early extensions could be high-value real estate — like early iOS apps. But Anthropic controls your distribution.
Multi-model routing is a survival skill. Abstract your model layer, route by task complexity, and keep your options open.

11 — Token Economics

The OpenClaw Lesson & the Cost of Intelligence

The OpenClaw ban isn't just policy drama. It's the first clear signal that flat-rate AI access cannot survive agent-scale usage — and every organization needs a token strategy.

🚫

What Happened with OpenClaw

April 4, 2026: Anthropic bans all third-party agent frameworks (OpenClaw, OpenCode, etc.) from using Claude subscription OAuth tokens.

Why: ~60% of active OpenClaw instances ran on Claude subscription credits. A single instance can consume $1,000–$5,000/day — on a $20–$200/month plan.

Boris Cherny: "Our subscriptions weren't built for the usage patterns of these third-party tools."

Result: Users must now pay API rates or stay within Claude Code's managed limits. OpenClaw creator (now at OpenAI) called it “a betrayal of open-source developers.”

⚖

The Bigger Pattern

This mirrors what happened across the industry:

OpenAI's model router — automatically sends simple requests to GPT-5.4 nano (cheapest) and complex ones to GPT-5.4 (most capable). Users don't choose; the system optimizes for cost.

Anthropic's adaptive thinking — Opus/Sonnet 4.6 skip expensive reasoning for simple requests automatically. Token spend self-adjusts.

The lesson: Every AI provider is moving from "unlimited" to "optimized." Revenue and profitability now directly shape what model answers your question.

Token Cost Comparison (per 1M tokens, 2026)

Model	Input	Output	Best For
Haiku 4.5	$1	$5	Classification, extraction, routing
Sonnet 4.6	$3	$15	General coding, analysis, writing
Opus 4.6	$5	$25	Complex reasoning, architecture, planning
Batch API	50% discount		Non-urgent processing within 24hr window

Your token strategy: Route simple tasks to Haiku ($1/M), daily work to Sonnet ($3/M), complex architecture to Opus ($5/M), and batch non-urgent work for 50% off. Organizations using model routing report 30–70% cost reductions while maintaining quality. This isn't optimization — it's a requirement.

12 — Practical Application

Making Claude Code Work Harder for You

Theory is nice. Here's what actually moves the needle in daily development work.

📝

CLAUDE.md — Your Agent's Operating System

The most impactful single file you can create. It loads every session and tells Claude how your world works.

Boris Cherny (Claude Code's creator) keeps his at ~100 lines, ~2,500 tokens. His golden rule: "Anytime we see Claude do something incorrectly, we add it to CLAUDE.md so it doesn't repeat next time."

# Example CLAUDE.md structure

## Project Context
- Stack: Next.js 16, React 19, MongoDB Atlas
- Deploy: Vercel, production branch is main

## Behavioral Rules
- Run tests after every change
- Never mock the database in integration tests
- Keep files under 150 lines
- Commit early and often with descriptive messages

## Aliases
- "the dashboard" = /src/app/dashboard/
- "deploy" = git push origin main

Pro tip: CLAUDE.md is advisory (~80% compliance). If something must happen every time, make it a hook instead. Hooks are deterministic — 100% execution rate.

13 — Hooks & Skills

The Two Multipliers

Skills extend what Claude can do. Hooks constrain how it does it. Together they turn a powerful but unpredictable assistant into something you can trust with your codebase.

🔌

Hooks

Deterministic scripts that run at specific points in Claude's workflow. Configure in settings.json.

PreToolUse — runs before any tool call (allow / deny / defer)
PostToolUse — runs after tool execution
PreCommit — gate commits with custom checks

New in v2.1.89: defer option lets hooks pause execution and wait for an external signal.

📚

Skills

Markdown files in ~/.claude/skills/ that give Claude domain knowledge and reusable workflows. No SDK, no build step.

Invoked automatically when relevant, or manually with /skill-name

Examples: research workflows, presentation generators, deployment scripts, analysis tools, code review personas

# Example: Pre-commit hook to block sensitive files
# In settings.json hooks section:
{
  "hooks": {
    "PreCommit": [{
      "command": "bash -c 'if git diff --cached --name-only | grep -qE \"\\.(env|key|pem)$\"; then echo \"BLOCKED: sensitive files\"; exit 1; fi'"
    }]
  }
}

Real impact: One practitioner reported a 30-page blockchain analysis (15 charts, 40+ SQL queries) completed in one evening that would have taken a full work week manually. Skills were the primary enabler.

14 — Commands & Techniques

The Commands That Actually Matter

A curated list of the techniques and workflows that experienced Claude Code users rely on daily.

Session Management

Command / Technique	What It Does
`/clear`	Reset context. Start fresh with ~20K tokens instead of degrading at 60%+ usage.
`/compact`	Compress context without losing everything. Good for mid-task cleanup.
`claude -p "prompt"`	Non-interactive mode. Use in CI pipelines, pre-commit hooks, or automated scripts.
`--output-format stream-json`	Streaming JSON output for programmatic consumption.

Workflow Patterns

Pattern	How It Works
Plan-then-execute	Ask Claude to draft a plan with no implementation. Annotate in editor. Send back. Repeat until solid. Then: "implement."
One task per session	Fresh context costs ~20K tokens. Quality loss from a degraded session costs much more. Dump plan to a file, `/clear`, reload.
Subagent delegation	Define specialist personas in `.claude/agents/`. Claude spawns them in isolated context windows and gets compressed summaries back.
MCP integration	Use `claude mcp add` to connect Notion, Figma, databases, monitoring. Claude queries them directly instead of copy-pasting.

Keyboard Shortcuts

Esc — Cancel current generation

Tab — Accept autocomplete suggestion

Ctrl+C — Interrupt and get partial result

! command — Run shell command in session

/ — Browse available slash commands

@ file — Add file to context

The golden rule: CLAUDE.md for project context, skills for specialized workflows, hooks for safety guarantees, and /clear liberally. Context quality beats context quantity every time.

15 — Context Economics

Managing the 200K Token Window

The leak confirmed what practitioners already knew: context management is the single biggest factor in output quality.

~20K

Tokens per fresh session

20–40%

Quality starts degrading

60%+

Noticeable quality loss

1M

Opus 4.6 context window

✅

Do This

Start fresh sessions for each distinct task
Use CLAUDE.md for persistent context (loads automatically)
Use skills for specialized knowledge (loads on demand)
Dump progress to a file before /clear
Use @file to pull in specific files, not entire directories

❌

Avoid This

Packing the entire codebase into context
Running multi-hour sessions without clearing
Putting volatile info in CLAUDE.md (it loads every time)
Relying on context alone instead of external persistence
Over-specifying CLAUDE.md — Claude ignores rules lost in noise

16a — Memory Is the Moat

Karpathy's LLM Wiki: RAG Without RAG

On April 3, 2026 — two days after the Claude Code leak — Andrej Karpathy published something quietly more important: a knowledge architecture that replaces RAG with a living markdown wiki maintained by the AI itself.

🧠

The Core Idea

Karpathy stopped using AI primarily for code. He's using it to build a second brain — a system where the LLM acts as a full-time librarian.

No vector databases. No embedding pipelines. Just markdown files and an LLM that reads, writes, and maintains them. ~100 articles, ~400K words — with minimal direct human intervention.

The Three-Folder Architecture

raw/
Dump everything

→

wiki/
LLM compiles & links

→

index
Navigate & query

1. Ingest

Research papers, repos, web articles go into raw/. Obsidian Web Clipper converts pages to .md with local images for vision models.

2. Compile

The LLM writes a structured wiki: summaries, concepts, encyclopedia-style articles, and backlinks between ideas. This is the step RAG skips.

3. Maintain

The LLM runs "health checks" — linting for inconsistencies, missing data, or new connections. The wiki evolves autonomously.

❌

Traditional RAG

Vector embeddings are a black box
Retrieval noise increases with scale
Requires embedding model + vector DB + pipeline
Knowledge is implicit in vectors

✅

LLM Wiki (Karpathy)

Markdown is human-readable and traceable
Navigation via summaries and index pages
Zero infrastructure: just files and an LLM
Knowledge is explicit, editable, deletable

16b — Memory Is the Moat

Why Memory Systems Are the Real Differentiator

The Claude Code leak, Conway's behavioral lock-in, and Karpathy's wiki all point to the same conclusion.

💡

Three Memory Architectures, One Pattern

Claude Code's CLAUDE.md + skills + auto-memory = a primitive brain system that compounds per-project knowledge across sessions
Conway's persistent agent memory = institutional knowledge that creates behavioral lock-in (the platform keeps it)
Karpathy's wiki = personal knowledge that stays with you, not the platform (you keep it)

The pattern is clear: The people and organizations getting the best results from AI aren't writing better prompts. They're building better memory systems — structured, persistent, human-readable knowledge bases that make every AI interaction smarter than the last.

🛠

Start Today

Create a brain/ or knowledge/ directory
Dump research, notes, and articles into raw/
Let your AI compile it into structured, interlinked markdown
Review and edit — the human stays in the loop
Over weeks, you'll have a second brain that you own, not your AI vendor

Ownership matters: Conway's memory creates lock-in because the platform keeps it. Karpathy's wiki creates leverage because you keep it. The decision about where your knowledge lives is one of the most consequential choices in AI adoption.

17 — What's Next

Where This Is All Heading

Reading the confirmed announcements and the leaked signals together, the trajectory is clear.

Now

Interactive Agent + Cloud Infrastructure

Claude Code, Cowork, and Managed Agents form a three-layer agent platform: CLI, desktop, and cloud. You choose the surface.

Near-term

Background Execution

Kairos-like patterns suggest always-on agents that work while you sleep — delegated jobs, overnight synthesis, continuous monitoring.

Near-term

Multi-Model Routing

Internal codenames suggest the "Claude" brand will route across specialized models optimized for planning, coding, latency, or depth.

Medium-term

Security as Product Surface

Glasswing signals that prompt-injection resilience, permissions, and adversarial evaluation become first-class product features, not afterthoughts.

Medium-term

Agent Operating System

The competitive layer shifts from models to orchestration. The companies that win won't just have better AI — they'll have better systems around the AI.

18 — In the Works

What I've Been Building

These concepts aren't theoretical. Here's the ecosystem of tools I'm building that put these ideas into practice — from agent orchestration to knowledge management to native apps.

6

Active projects

Swift

Native macOS apps

Next.js

Electron + Web

100%

AI-assisted builds

🦗

Hive

AI agent orchestrator with live Kanban, multi-model routing, SSH MCP, and autonomous roadmap.

🌳

Canopy

AI-powered SSH client + Git client for macOS. Terminal, SFTP, server dashboard, BYOK AI chat.

🧠

Brainpower

Personal knowledge vault app with AI-powered hybrid search, vector embeddings, and 3-tier cloud sync.

🌿

Glade

Lightweight macOS app for fast application switching via global hotkey with a customizable launcher panel.

📡

Beam

Mac-native presentation viewer. Opens .html, .js, .pdf, .md files with dark/light mode and clean rendering.

📚

Brain

Local markdown vault (~/brain) — the knowledge layer that all projects and AI agents read and write.

19 — Hive

AI Agent Orchestration Platform

Hive is an Electron + Next.js desktop app that autonomously dispatches Claude Code agents, streams progress in real-time, and manages approvals for risky operations. Think of it as a team of AI developers you manage via a Kanban board.

Hive Kanban board showing AI agent tasks across Backlog, In Progress, Review, and Done columns

Hive Kanban — AI agents managed like a development team

⚙

Core Architecture

Electron shell wrapping Next.js at localhost:4000

WebSocket live updates for real-time Kanban dashboard

Scheduler dispatching up to 4 concurrent agents across different projects

SQLite database — no external DB needed

PID registry + recovery for agent process management

👥

11 Agent Profiles

Developer Researcher CEO / Visionary CTO COO CFO Data Analyst Marketing Founder Trader General

Each profile has custom system prompts, allowed tools, and can delegate subtasks to other profiles (CEO → COO → Developer chain).

Task Lifecycle

Backlog

→

Assigned

→

In Progress

→

Review

→

Done

Approval gates for risky operations (git push, rm, docker). Review flow with Approve / Request Changes / Reject. Retry failed tasks with feedback loop. 50-turn max per agent. 30-min approval timeout (auto-deny).

19b — Hive

Models, Usage & Analytics

📊

Multi-Model Support

Opus Sonnet Haiku Gemini Pro Gemini Flash Flash Lite Local / Ollama

Select model per task. Route heavy architecture to Opus, daily coding to Sonnet, lightweight ops to Haiku. Local models via Ollama / LM Studio for cost-free work.

📈

Token Usage & Reporting

Weekly reports — tasks completed, tokens consumed, cost breakdown by profile and project

Token usage trends — bar charts by model (Haiku, Opus, Sonnet, Gemma), cost over time

Cache efficiency — hit/write/uncached rates with trend analysis. 82.8% cache hit rate achieved.

Hive weekly report showing tasks completed, tokens consumed, and cost breakdown

Weekly Report — tasks, tokens, cost breakdown

Hive token usage trends dashboard with cache efficiency analysis

Token Usage Trends & Cache Efficiency

20 — Hive SSH MCP & Ops

Agents That Reach Into Your Servers — With Guardrails

Hive includes an SSH MCP server that gives agents the ability to run commands on remote machines. But the real design point isn't access — it's restriction. The MCP server itself defines the security boundary: what an LLM can and cannot do, enforced at the protocol layer, not by hoping the model behaves.

🔌

SSH MCP Server

An MCP server built into Hive that lets agents run SSH commands on remote hosts. Agents discover available servers with mcp__ssh__list_hosts, then execute commands via mcp__ssh__exec.

Example: An ops task says "tell me the uptime on digibot" — the agent SSHs into the server and returns the result. No human interaction needed.

💻

Real Output — Agent Running an SSH Task

$ ssh digibot uptime
up 178 days, 3:29 — 3 users
load average: 1.28, 1.16, 1.16

Running strong at 178 days.
Load averages are moderate and
stable across 1/5/15 min windows.

Agent used Bash → SSH → parsed output → summarized. Total time: 1m 56s including agent reasoning.

🛠

Hive MCP Tool API

The Hive MCP server exposes 13 tools for external AI agents to manage the entire task system programmatically:

hive_list_tasks hive_create_task hive_get_task hive_update_task hive_approve_task hive_request_changes hive_move_task hive_delete_task hive_retry_task hive_add_comment hive_kill_agent hive_list_projects hive_get_agent_status

This means Claude Code running locally can create, monitor, and manage Hive tasks — an agent orchestrating agents.

20b — Hive SSH MCP

Skills, Security & the Agent-Native Principle

💻

Hive Running Skills

Hive agents can invoke Claude Code skills directly. In the screenshot, an ops task runs /import-vodafone — a custom skill that triggers a Python data pipeline to pull SIM Inventory reports from the Vodafone M2M Portal and sync them to MongoDB. The agent handles the entire workflow: skill invocation, script execution, error handling, and reporting back.

Pattern: Define your ops workflows as skills. Point Hive at them. Walk away.

Hive agent executing an SSH uptime check on a remote server via MCP

Agent SSH task — checking uptime via MCP

Hive agent invoking the import-vodafone skill to run a data pipeline

Running /import-vodafone skill from Hive

🛡

Security by Design: The MCP as Guardrail

The SSH MCP server isn't just a convenience layer — it's a security architecture decision. Instead of giving an LLM raw shell access and hoping prompt instructions prevent misuse, the MCP server enforces restrictions at the protocol level:

Allowlisted commands — the server decides which operations agents can invoke, not the model
Scoped host access — agents only see servers explicitly registered; no lateral movement
Audit trail — every command execution is logged with agent identity, timestamp, and full output
No credential exposure — SSH keys and passwords live in the server process, never passed to the LLM context

The broader principle: This design pattern extends to everything we build. We originally designed systems for humans — dashboards, search engines, data pipelines. Now agents are the primary operators: searching the internet, collecting data, making recommendations. Every interface needs a parallel agent-safe path with explicit permissions, rate limits, and structured outputs. Design for agents, not just humans. The MCP server is the template.

21 — Hive Autonomous Mode

The Autonomous Roadmap

Hive's next evolution: Autonomous Mode — a goal-driven execution layer where the CEO agent decomposes a high-level objective, the COO plans operationally, and specialists execute in parallel. Informed by Paperclip, Hermes Agent, and OpenMOSS.

Mission Lifecycle

Strategy
CEO agent

→

Planning
COO agent

→

Execution
Specialists

→

Review
COO validates

→

Retro
CEO learns

🎯

Goal Ancestry (from Paperclip)

Every task carries a chain of reasoning from mission → strategy → plan → task. Specialists understand why they're doing what they're doing, not just what.

{
  "goalAncestry": [
    "Make the digital dashboard a more commercially competitive product",
    "Increase ad revenue by optimizing load time",
    "Audit bundle size, find largest deps"
  ]
}

🔨

DAG Scheduler

Tasks form a directed acyclic graph with dependency edges. Independent tasks execute in parallel waves; dependent tasks chain sequentially.

Wave 1: Audit, Research, Pull bugs (parallel)
Wave 2: Optimize, Build, Fix (each depends on Wave 1)
Wave 3: Launch email (depends on Wave 2)

Replaces the current FIFO scheduler for mission tasks.

21b — Hive Autonomous Mode

Budget, Failure Recovery & the Vision

💰

Budget Cascade

Mission-level budget ($25 default) cascades to phases and individual tasks. Scheduler checks budgetSpentUsd < budgetUsd before spawning. If budget exhausted → pause mission, notify human.

🚨

Failure Escalation (from Hermes)

Three-tier recovery: Retry (same agent, fresh session, max 2) → Replan (COO decomposes differently) → Escalate (pause, notify human with full context).

Hive Autonomous Mode design document in Brainpower showing competitive landscape analysis

Autonomous Mode design doc — research in Brainpower

Task creation — profile, model, and project selection

The vision: "Make the digital dashboard a more commercially competitive product" → CEO produces strategy with 3 goals → COO builds task DAG with 7 tasks across 3 parallel waves → specialists execute, auto-review, rework if needed → COO validates against success criteria → CEO logs lessons learned. Human checkpoints at strategy and planning phases.

22 — Canopy

AI-Powered SSH & Git Client for macOS

Canopy is a native macOS app that combines SSH terminal, SFTP file management, Git client, and AI chat into a single workspace. Built with SwiftUI, zero external SDK dependencies, BYOK (Bring Your Own Key) for any AI provider.

🖥

Terminal & SSH

Multi-tab SSH terminal powered by Citadel (PTY) + SwiftTerm

Local terminal via SwiftTerm LocalProcess

Server dashboard — uptime, disk, memory, CPU at a glance

SFTP file comparison & push-to-remote

🌳

Git & AI

Full Git client — repo scanning, branches, diff, history, local state

BYOK AI chat panel with terminal context injection

Supports: Claude, OpenAI, OpenRouter, Ollama, LM Studio — all via URLSession REST + SSE, no SDKs

Xcode-style toolbar toggles: Dashboard, Terminal, Git, Files, AI

Tech Stack

Layer	Technology	Purpose
UI	SwiftUI	Native macOS 15+, NavigationSplitView
SSH	Citadel	SSH, PTY, SFTP — pure Swift
Terminal	SwiftTerm	Terminal emulation (NSView + UIView)
Syntax	HighlightSwift	Code syntax highlighting
Auth	macOS Keychain	Server passwords + API keys
Build	SPM (Swift 6.0)	No Xcode project, pure Package.swift

22b — Canopy

Human Interface, Agent Interface — Same Servers

Canopy macOS app showing SSH terminal, file browser, and AI assistant panel

Canopy — SSH terminal, file browser, and AI panel in a single workspace

Why this matters: Canopy demonstrates what Hive's SSH MCP does from the agent side — Canopy does it from the human side. Same servers, same credentials, two interfaces: one for humans, one for AI agents. This is the agent-native design principle in action.

23 — Brainpower & the Brain

Your Knowledge, Your Search, Your AI

Brainpower is Karpathy's "LLM Wiki" concept made real — a native macOS app that gives you a window into a local markdown vault with AI-powered hybrid search, vector embeddings, and a 3-tier cloud evolution. Inspired by Karpathy's knowledge architecture and Nate B Jones' "One Brain" philosophy.

The 3-Tier Brain Architecture

L1: Local
~/brain — offline

→

L2: BrainCloud
Personal Atlas

→

L3: BrainMerge
Team knowledge

L1 (Local): Plain markdown files in ~/brain. Brainpower's built-in embeddings + ripgrep. Always works offline.
L2 (BrainCloud): Personal MongoDB Atlas cluster with vector embeddings. Atlas Vector Search + Atlas Search for cloud-powered semantic search.
L3 (BrainMerge): Shared Atlas cluster. Multi-tenant team knowledge — everyone's notes combined. Shared AI learns from all of it.

🔍

Hybrid Search

Keyword search via ripgrep for exact matches

Semantic search via Ollama embeddings (768d nomic / up to 4096d qwen3) + vDSP dot product

Reciprocal Rank Fusion merges both result sets

AI synthesis via Claude API — search results compiled into coherent answers with citations

🛠

App Features

Vault browser — sections, tags, filter bar, real-time search

Markdown editor with Mermaid diagram support

PDF export and AirDrop to iPhone

File watcher — auto-reloads when files change on disk

Tag system — color-coded, filterable

23b — Brainpower

Atlas Vector Search & the Brain in Action

Atlas Vector Search (BrainCloud)

MongoDB Atlas stores vector embeddings alongside the source documents — no separate vector database. Supports $vectorSearch (ANN/ENN), $rankFusion (MongoDB 8.0+), and $scoreFusion (MongoDB 8.2+) for native hybrid search. Pre-filtering on metadata (tags, section, date) narrows the search space before vector comparison.

Free tier (M0): 512MB storage, 1 vector index — sufficient for a personal Brain.

Brainpower vault browser showing document list with tags, sections, and filter bar

Brainpower vault — documents, tags, and search

Brainpower showing Brain Vector Search architecture document with MongoDB Atlas flow diagram

Brain Vector Search architecture in Brainpower

Brain Vector Search architecture diagram showing MongoDB Atlas integration with local Brain vault

Brain Vector Search — MongoDB Atlas Architecture

The connection: This is the practical implementation of two ideas from this talk — Karpathy's "RAG without RAG" wiki approach (L1), extended with proper vector search (L2), and federated team knowledge (L3). The AI reads and writes the Brain. You own it.

24 — Glade & Beam

The Supporting Cast

Not every tool is an AI orchestrator. Some are small, sharp utilities that solve one problem well.

🌿

Glade — App Launcher

A lightweight macOS menubar utility for fast application switching. Trigger with a global hotkey → customizable panel of your most-used apps appears → click to switch.

Built with: SwiftUI, macOS 15+, SPM, zero dependencies

Architecture: CGEventTap for global keyboard monitoring, PopupWindowController for UI, PersistenceService for saved app list

Features: Hotkey-triggered popup, optional labels, Mission Control integration, onboarding wizard, settings panel

📡

Beam — Presentation Viewer

A Mac-native app that opens presentation files with a clean, focused interface. Drop in an .html, .js, .pdf, or .md file and Beam renders it beautifully.

Built with: SwiftUI, macOS 15+, SPM, zero dependencies

Features: Dark / Light / System theme toggle, drag-and-drop file open, clean minimal UI

Why it exists: Every HTML presentation in this series is designed to be viewed in Beam — the app I built to present them.

Beam macOS app for presenting HTML, JS, PDF, and Markdown files

Beam — present any document beautifully

macOS dock showing suite of custom-built apps: Canopy, Hive, Brainpower, and more

The full suite — all built with Claude Code

The common thread: Every one of these apps is a native SwiftUI app built with Swift Package Manager (no Xcode project files), targeting macOS 15+, using Swift 6.0 toolchain with v5 language mode. Zero external SDK dependencies where possible. All built with Claude Code assistance.

25 — Key Takeaways

What to Remember: Leadership & Developers

For Leadership

The moat is the harness, not the model. Orchestration, permissions, and tool routing are the real competitive surface.
Agent infrastructure is now a managed service. Build-vs-buy calculus shifted with Managed Agents at $0.08/hour.
Security posture is visible. After the leak, operational competence is part of vendor trust evaluation.
Memory ownership is a strategic decision. Conway keeps it. Karpathy's approach lets you keep it. Choose deliberately.

For Developers

Invest in CLAUDE.md now. It's the highest-ROI file in your repo. Add to it every time Claude makes a mistake.
Learn hooks and skills. Skills extend capability, hooks enforce safety. Together they compound over time.
Manage context aggressively. One task per session, /clear often, dump progress to files.
Use MCP integrations. Connect your tools directly instead of copy-pasting between interfaces.

26 — Key Takeaways

Five Things to Do This Week

Start Today

Create a CLAUDE.md in your project root (start with 50–100 lines of project context and behavioral rules)
Add one pre-commit hook to block sensitive file commits — this enables unattended operation
Try the plan-then-execute workflow on your next feature — draft plan, annotate, iterate, then implement
Start a brain/ directory — dump research into raw/, let AI compile it into structured markdown you own
Connect one external tool via MCP (claude mcp add) — Notion, Figma, database, whatever you copy-paste from most

“The field moves fast. The leak accelerated that. What matters is building systems around AI that compound your team's capability over time.”

Claude Leaks, Lessons & What's Next

Two Weeks That Changed the AI Landscape

What Was Actually Exposed

The Agent Architecture Blueprint

What the Leak Means in Practice

Claude Mythos & Defensive Security

From Prototype to Production in Days

Claude Is Becoming an Operating Layer

Codenames and What's Likely Coming

Conway: The Layer Above MCP

Perspectives: The New User & the Executive

Perspectives: The Developer & the Founder

The OpenClaw Lesson & the Cost of Intelligence

Making Claude Code Work Harder for You

The Two Multipliers

The Commands That Actually Matter

Managing the 200K Token Window

Karpathy's LLM Wiki: RAG Without RAG

Why Memory Systems Are the Real Differentiator

Where This Is All Heading

What I've Been Building

AI Agent Orchestration Platform

Models, Usage & Analytics

Agents That Reach Into Your Servers — With Guardrails

Skills, Security & the Agent-Native Principle

The Autonomous Roadmap

Budget, Failure Recovery & the Vision

AI-Powered SSH & Git Client for macOS

Human Interface, Agent Interface — Same Servers

Your Knowledge, Your Search, Your AI

Atlas Vector Search & the Brain in Action

The Supporting Cast

What to Remember: Leadership & Developers

Five Things to Do This Week