Developer Briefing — April 2026

Claude Leaks, Lessons & What's Next

The code leak, Project Glasswing, Managed Agents, and practical techniques for getting more out of Claude today.
Irv Cassio • AI Enthusiasts Group • April 2026
01 — Timeline

Two Weeks That Changed the AI Landscape

From an accidental source map to a $100M security initiative, the last two weeks of March–April 2026 revealed more about Claude's architecture than any official announcement.

March 11, 2026
Bun Bug Reported
A known Bun issue surfaces: source maps are served in production builds even when docs say they shouldn't be. The bug sits open for 20 days.
March 24, 2026
Claude Cowork Ships
Anthropic launches Claude Computer Use for macOS — a desktop agent that can control apps, navigate browsers, and handle multi-step workflows.
March 31, 2026
The Leak
Claude Code npm package v2.1.88 ships with a 59.8 MB source map. Within hours: 1,906 files, 512,000+ lines of TypeScript are public. GitHub mirrors fork tens of thousands of times.
April 1–3, 2026
Containment Fails
Anthropic issues mass GitHub takedowns, later admits they impacted more repos than intended. Rewrites and ports in Python and Rust keep the architecture public.
April 7, 2026
Project Glasswing Announced
Anthropic reveals Claude Mythos Preview — a restricted frontier model given to 50+ organizations for defensive cybersecurity. $100M in credits committed.
April 8, 2026
Managed Agents Launch
Claude Managed Agents enters public beta: cloud-hosted agent infrastructure with sandboxed execution, checkpointing, and scoped permissions.
02 — The Leak

What Was Actually Exposed

The source map wasn't just code — it was the most detailed blueprint of a production AI agent system ever made public.

1,906
Files exposed
512K+
Lines of TypeScript
59.8 MB
Source map file
46K
Lines in QueryEngine alone
Architecture Exposed
QueryEngine (46K lines) — streaming, tool-call loops, thinking mode, token counting, permission wrapping

Tool System (29K lines) — 50+ tools across file ops, shell, agents, web, MCP, scheduling

Command System (25K lines) — CLI parsing, slash commands, hooks, feature flags
🔒
Security Details Exposed
Permission modes — six levels from default to "full bypass" that auto-approves all operations

System prompt assembly — hardcoded guardrails + CLAUDE.md + git context

Hook implementations — exact pre/post execution logic

Known CVEs — CVE-2025-59536 and CVE-2026-21852 now easier to weaponize
Key insight: The leak proved that the competitive moat is not the model — it's the orchestration, permissions, and tool routing wrapped around it. The harness is the product.
03 — Under the Hood

The Agent Architecture Blueprint

What the leak revealed is a five-layer agent execution system that goes far beyond a chatbot interface.

Entrypoint
Query Engine
Tool Registry
Execution
Verification
Technology Stack
LayerTechnologyPurpose
RuntimeBunFast JS runtime for agent execution
LanguageTypeScriptFull type safety across 512K lines
UIReact + InkTerminal UI rendering
ValidationZodSchema validation for tool inputs/outputs
AuthOAuth 2.0 + JWTAuthentication + macOS Keychain
TelemetryOpenTelemetryTracing, metrics, user frustration tracking
Notable finding: The codebase includes frustration detection — Claude tracks signals of user frustration to adjust its behavior. This was flagged by Scientific American as a privacy concern.
04 — Impact

What the Leak Means in Practice

The impact extends across security, competition, and the broader AI ecosystem.

Security
Pre-existing CVEs now far easier to exploit. Permission bypass logic is public. Attackers can craft targeted malicious repos that abuse previously unknown entry points.
Competition
Every AI lab now has a detailed reference implementation. The orchestration patterns, tool routing, and memory architecture are no longer trade secrets. Open-source ports appeared within days.
🌱
Ecosystem
Accelerated adoption and understanding. Developers now build on known architecture instead of guessing. Public scrutiny likely accelerates security patching. Community engagement surged.
19M
Views on initial X post
46K+
GitHub stars on mirrors
3+
Language ports (Python, Rust)
2
Known CVEs now exposed
Brand reality check: Two accidental code exposures within five days undermine the "safety-first" narrative central to Anthropic's market positioning. Operational competence is now part of the trust equation.
05 — Project Glasswing

Claude Mythos & Defensive Security

Days after the leak, Anthropic announced its most ambitious security initiative. Coincidence or crisis response — the move is significant either way.

🛡
What Is Mythos Preview
An unreleased frontier model restricted to defensive security work. Not publicly available — given to selected organizations to find and fix vulnerabilities before broader release.

Mythos autonomously found a 17-year-old RCE vulnerability in FreeBSD (CVE-2026-4747) with no human involvement after the initial request.

Anthropic says Mythos identified thousands of zero-day vulnerabilities across every major OS and browser.
🌐
Project Glasswing Partners
50+ organizations with access, including:

AWS Apple Microsoft Google NVIDIA CrowdStrike Palo Alto Networks Cisco JPMorganChase Linux Foundation Broadcom

$100M+ in usage credits committed, plus $4M in donations to open-source security organizations.
The signal: Anthropic is positioning AI models as security infrastructure, not just productivity tools. The restricted release model — defend first, release later — is new for the industry and sets a precedent.
06 — Managed Agents

From Prototype to Production in Days

Announced April 8, 2026, Claude Managed Agents is Anthropic's answer to the hardest part of building AI agents: the infrastructure.

What Managed Agents Handles for You
  • Sandboxed execution — isolated container per agent
  • Checkpointing — resume after failures
  • Credential management — secure secrets handling
  • Scoped permissions — control tool access
  • End-to-end tracing — full observability
  • Error recovery — auto-resume after outages
💰
Pricing
Model usage cost + $0.08 per agent runtime hour. That's the infrastructure premium for not building your own sandboxing, state management, and credential storage.
🏢
Early Adopters
Notion Rakuten Asana

Teams using Managed Agents across coding, task automation, and document processing workflows.
Research Preview Features
Multi-agent spawning — Complex tasks split across specialized sub-agents
Auto prompt refinement — Improved task success by up to 10 points in internal testing
Developer takeaway: If the leak showed the harness is the moat, Managed Agents is Anthropic selling access to that harness. You get the orchestration without building it.
07 — The Platform Wave

Claude Is Becoming an Operating Layer

Zoom out and the picture is clear: Anthropic shipped five major capabilities in six weeks. This is a platform play, not incremental updates.

DateReleaseWhat It Means
Feb 5 Opus 4.6 1M context window, agent teams, 300K max output tokens on Batches API
Feb 17 Sonnet 4.6 Same-price upgrade, matching 1M context
Mar 24 Claude Cowork Desktop agent: controls Mac apps, navigates browsers, handles multi-step tasks autonomously
Mar 31 The Leak Full agent architecture becomes public knowledge
Apr 7 Project Glasswing Mythos Preview for defensive security with 50+ partner organizations
Apr 8 Managed Agents Cloud-hosted agent infrastructure as a service
Model
Opus / Sonnet 4.6
Desktop
Cowork + Dispatch
Cloud
Managed Agents
Security
Mythos / Glasswing
The pattern: Claude isn't improving as a chatbot. It's expanding into a runtime — a layer between you and your computer, your cloud, your infrastructure, and your security posture.
08 — Internal Signals

Codenames and What's Likely Coming

The leak and subsequent analysis surfaced internal labels and feature flags. Some are confirmed, some remain speculative.

Confirmed or Shipped
Mythos Preview — Active in Project Glasswing, named partners, live security work

Managed Agents — Public beta with API, pricing, and early adopters

Cowork / Computer Use — GA for macOS, Windows coming
🔬
Unconfirmed / Speculative
Capybara Fennec Numbat Tengu
Internal model family codenames — likely tiered development tracks

Kairos — Background/always-on execution
AutoDream — Memory consolidation / overnight synthesis
UltraPlan — Deeper multi-step planning mode
Reality check: Feature flags are not launch promises. Internal experiments may never ship. Treat codenames as directional signals, not product roadmap commitments.
09 — The Real Moat

Conway: The Layer Above MCP

The leak revealed something most analysis missed: Anthropic is building a proprietary orchestration layer on top of the open MCP standard. This is the classic platform play — and it's called Conway.

The Two-Layer Strategy

Open Layer

MCP (Model Context Protocol)

  • Open standard donated to Linux Foundation
  • Adopted by OpenAI, Google, and others
  • Portable tool connectors across platforms
  • Standardized AI-to-tool communication

Purpose: Create adoption. Build the ecosystem.

Proprietary Layer

Conway (CNW)

  • Always-on agent runtime with persistent memory
  • Custom extensions in .cnw.zip format
  • Webhook triggers, event streams, scheduling
  • UI panels: Search, Chat, System controls

Purpose: Create lock-in. Capture value.

MCP
Open • Portable
Conway
Proprietary • Persistent
CNW Extensions
App Store • Locked In
🔒
The Behavioral Lock-In Problem
The deepest moat isn't data portability — it's behavioral lock-in. Once Conway runs continuously for months, it accumulates decision patterns, workflow preferences, inferred business rules, and edge cases it's learned to handle.

There is no export format for an agent's learned operational intelligence. No regulatory framework for migrating it. Switching means retraining — potentially months of ramp-up cost.

This is the moat. Not the model weights. Not MCP. The layer between the open standard and your daily operations.
Think of it like this: MCP is like USB — a universal connector anyone can use. Conway is like iOS — the operating system that makes USB useful, but locks you into Apple's ecosystem. The connector is open. The experience layer is not.
10a — What This Means to You

Perspectives: The New User & the Executive

The same events read very differently depending on where you sit.

🌱
If You're New to AI
  • The barrier to entry just dropped. The leak means the architecture of how AI agents work is no longer a mystery. Learning materials exploded — the blueprint is public.
  • Start with Claude Code + CLAUDE.md. Write a project description file, install Claude Code, and start asking it to help with real work.
  • Expect cost changes. The OpenClaw ban shows flat-rate AI access is unsustainable. Budget for API costs or stay within official tool limits.
  • Pick a lane. Open-source tools (Aider, Cline, OpenCode) vs proprietary platforms (Claude Code, Conway). Your choice now shapes switching costs later.
🏢
If You're a CIO or CEO
  • Vendor trust just got more complex. Evaluate AI vendors on operational competence, not just model capability.
  • The Conway lock-in is real. Agents that run 24/7 accumulate institutional knowledge that doesn't port. Insist on platform-independent documentation.
  • Managed Agents changes the build-vs-buy math. At $0.08/hour, many internal agent platforms no longer justify their engineering cost.
  • Token economics hit the P&L directly. Model routing can cut AI costs 60–80%. This needs to be a line-item strategy.
10b — What This Means to You

Perspectives: The Developer & the Founder

For builders and founders, the last two weeks reshaped both the opportunity and the risk landscape.

💻
If You're a Developer
  • The leaked architecture is your study guide. Five-layer agent systems, tool routing, permission models, memory management — the reference implementation is public.
  • Build on MCP, be cautious with CNW. MCP integrations are portable. Conway extensions are not. Invest in the open layer first.
  • Skills and hooks are the productivity multiplier. Not the model. Not the prompt. The system you build around the model compounds over time.
  • Understand the cost model. A single OpenClaw instance can burn $1K–$5K/day. Use the Batch API (50% off) for non-urgent work. Route intelligently.
🚀
If You're a Startup Founder
  • The "wrapper" startup is dead. The "harness" startup is alive. The value is in orchestration, permissions, domain-specific workflows, and the memory layer.
  • Managed Agents is both opportunity and threat. Easier to launch agent-powered products, but your infra moat just evaporated. Differentiate on domain knowledge.
  • Watch the CNW extension ecosystem. Early extensions could be high-value real estate — like early iOS apps. But Anthropic controls your distribution.
  • Multi-model routing is a survival skill. Abstract your model layer, route by task complexity, and keep your options open.
11 — Token Economics

The OpenClaw Lesson & the Cost of Intelligence

The OpenClaw ban isn't just policy drama. It's the first clear signal that flat-rate AI access cannot survive agent-scale usage — and every organization needs a token strategy.

🚫
What Happened with OpenClaw
April 4, 2026: Anthropic bans all third-party agent frameworks (OpenClaw, OpenCode, etc.) from using Claude subscription OAuth tokens.

Why: ~60% of active OpenClaw instances ran on Claude subscription credits. A single instance can consume $1,000–$5,000/day — on a $20–$200/month plan.

Boris Cherny: "Our subscriptions weren't built for the usage patterns of these third-party tools."

Result: Users must now pay API rates or stay within Claude Code's managed limits. OpenClaw creator (now at OpenAI) called it “a betrayal of open-source developers.”
The Bigger Pattern
This mirrors what happened across the industry:

OpenAI's model router — automatically sends simple requests to GPT-5.4 nano (cheapest) and complex ones to GPT-5.4 (most capable). Users don't choose; the system optimizes for cost.

Anthropic's adaptive thinking — Opus/Sonnet 4.6 skip expensive reasoning for simple requests automatically. Token spend self-adjusts.

The lesson: Every AI provider is moving from "unlimited" to "optimized." Revenue and profitability now directly shape what model answers your question.
Token Cost Comparison (per 1M tokens, 2026)
ModelInputOutputBest For
Haiku 4.5$1$5Classification, extraction, routing
Sonnet 4.6$3$15General coding, analysis, writing
Opus 4.6$5$25Complex reasoning, architecture, planning
Batch API50% discountNon-urgent processing within 24hr window
Your token strategy: Route simple tasks to Haiku ($1/M), daily work to Sonnet ($3/M), complex architecture to Opus ($5/M), and batch non-urgent work for 50% off. Organizations using model routing report 30–70% cost reductions while maintaining quality. This isn't optimization — it's a requirement.
12 — Practical Application

Making Claude Code Work Harder for You

Theory is nice. Here's what actually moves the needle in daily development work.

📝
CLAUDE.md — Your Agent's Operating System
The most impactful single file you can create. It loads every session and tells Claude how your world works.

Boris Cherny (Claude Code's creator) keeps his at ~100 lines, ~2,500 tokens. His golden rule: "Anytime we see Claude do something incorrectly, we add it to CLAUDE.md so it doesn't repeat next time."
# Example CLAUDE.md structure

## Project Context
- Stack: Next.js 16, React 19, MongoDB Atlas
- Deploy: Vercel, production branch is main

## Behavioral Rules
- Run tests after every change
- Never mock the database in integration tests
- Keep files under 150 lines
- Commit early and often with descriptive messages

## Aliases
- "the dashboard" = /src/app/dashboard/
- "deploy" = git push origin main
Pro tip: CLAUDE.md is advisory (~80% compliance). If something must happen every time, make it a hook instead. Hooks are deterministic — 100% execution rate.
13 — Hooks & Skills

The Two Multipliers

Skills extend what Claude can do. Hooks constrain how it does it. Together they turn a powerful but unpredictable assistant into something you can trust with your codebase.

🔌
Hooks
Deterministic scripts that run at specific points in Claude's workflow. Configure in settings.json.

PreToolUse — runs before any tool call (allow / deny / defer)
PostToolUse — runs after tool execution
PreCommit — gate commits with custom checks

New in v2.1.89: defer option lets hooks pause execution and wait for an external signal.
📚
Skills
Markdown files in ~/.claude/skills/ that give Claude domain knowledge and reusable workflows. No SDK, no build step.

Invoked automatically when relevant, or manually with /skill-name

Examples: research workflows, presentation generators, deployment scripts, analysis tools, code review personas
# Example: Pre-commit hook to block sensitive files
# In settings.json hooks section:
{
  "hooks": {
    "PreCommit": [{
      "command": "bash -c 'if git diff --cached --name-only | grep -qE \"\\.(env|key|pem)$\"; then echo \"BLOCKED: sensitive files\"; exit 1; fi'"
    }]
  }
}
Real impact: One practitioner reported a 30-page blockchain analysis (15 charts, 40+ SQL queries) completed in one evening that would have taken a full work week manually. Skills were the primary enabler.
14 — Commands & Techniques

The Commands That Actually Matter

A curated list of the techniques and workflows that experienced Claude Code users rely on daily.

Session Management
Command / TechniqueWhat It Does
/clear Reset context. Start fresh with ~20K tokens instead of degrading at 60%+ usage.
/compact Compress context without losing everything. Good for mid-task cleanup.
claude -p "prompt" Non-interactive mode. Use in CI pipelines, pre-commit hooks, or automated scripts.
--output-format stream-json Streaming JSON output for programmatic consumption.
Workflow Patterns
PatternHow It Works
Plan-then-execute Ask Claude to draft a plan with no implementation. Annotate in editor. Send back. Repeat until solid. Then: "implement."
One task per session Fresh context costs ~20K tokens. Quality loss from a degraded session costs much more. Dump plan to a file, /clear, reload.
Subagent delegation Define specialist personas in .claude/agents/. Claude spawns them in isolated context windows and gets compressed summaries back.
MCP integration Use claude mcp add to connect Notion, Figma, databases, monitoring. Claude queries them directly instead of copy-pasting.
Keyboard Shortcuts

Esc — Cancel current generation

Tab — Accept autocomplete suggestion

Ctrl+C — Interrupt and get partial result

! command — Run shell command in session

/ — Browse available slash commands

@ file — Add file to context

The golden rule: CLAUDE.md for project context, skills for specialized workflows, hooks for safety guarantees, and /clear liberally. Context quality beats context quantity every time.
15 — Context Economics

Managing the 200K Token Window

The leak confirmed what practitioners already knew: context management is the single biggest factor in output quality.

~20K
Tokens per fresh session
20–40%
Quality starts degrading
60%+
Noticeable quality loss
1M
Opus 4.6 context window
Do This
  • Start fresh sessions for each distinct task
  • Use CLAUDE.md for persistent context (loads automatically)
  • Use skills for specialized knowledge (loads on demand)
  • Dump progress to a file before /clear
  • Use @file to pull in specific files, not entire directories
Avoid This
  • Packing the entire codebase into context
  • Running multi-hour sessions without clearing
  • Putting volatile info in CLAUDE.md (it loads every time)
  • Relying on context alone instead of external persistence
  • Over-specifying CLAUDE.md — Claude ignores rules lost in noise
16a — Memory Is the Moat

Karpathy's LLM Wiki: RAG Without RAG

On April 3, 2026 — two days after the Claude Code leak — Andrej Karpathy published something quietly more important: a knowledge architecture that replaces RAG with a living markdown wiki maintained by the AI itself.

🧠
The Core Idea
Karpathy stopped using AI primarily for code. He's using it to build a second brain — a system where the LLM acts as a full-time librarian.

No vector databases. No embedding pipelines. Just markdown files and an LLM that reads, writes, and maintains them. ~100 articles, ~400K words — with minimal direct human intervention.
The Three-Folder Architecture
raw/
Dump everything
wiki/
LLM compiles & links
index
Navigate & query

1. Ingest

Research papers, repos, web articles go into raw/. Obsidian Web Clipper converts pages to .md with local images for vision models.

2. Compile

The LLM writes a structured wiki: summaries, concepts, encyclopedia-style articles, and backlinks between ideas. This is the step RAG skips.

3. Maintain

The LLM runs "health checks" — linting for inconsistencies, missing data, or new connections. The wiki evolves autonomously.

Traditional RAG
  • Vector embeddings are a black box
  • Retrieval noise increases with scale
  • Requires embedding model + vector DB + pipeline
  • Knowledge is implicit in vectors
LLM Wiki (Karpathy)
  • Markdown is human-readable and traceable
  • Navigation via summaries and index pages
  • Zero infrastructure: just files and an LLM
  • Knowledge is explicit, editable, deletable
16b — Memory Is the Moat

Why Memory Systems Are the Real Differentiator

The Claude Code leak, Conway's behavioral lock-in, and Karpathy's wiki all point to the same conclusion.

💡
Three Memory Architectures, One Pattern
  • Claude Code's CLAUDE.md + skills + auto-memory = a primitive brain system that compounds per-project knowledge across sessions
  • Conway's persistent agent memory = institutional knowledge that creates behavioral lock-in (the platform keeps it)
  • Karpathy's wiki = personal knowledge that stays with you, not the platform (you keep it)
The pattern is clear: The people and organizations getting the best results from AI aren't writing better prompts. They're building better memory systems — structured, persistent, human-readable knowledge bases that make every AI interaction smarter than the last.
🛠
Start Today
  1. Create a brain/ or knowledge/ directory
  2. Dump research, notes, and articles into raw/
  3. Let your AI compile it into structured, interlinked markdown
  4. Review and edit — the human stays in the loop
  5. Over weeks, you'll have a second brain that you own, not your AI vendor
Ownership matters: Conway's memory creates lock-in because the platform keeps it. Karpathy's wiki creates leverage because you keep it. The decision about where your knowledge lives is one of the most consequential choices in AI adoption.
17 — What's Next

Where This Is All Heading

Reading the confirmed announcements and the leaked signals together, the trajectory is clear.

Now
Interactive Agent + Cloud Infrastructure
Claude Code, Cowork, and Managed Agents form a three-layer agent platform: CLI, desktop, and cloud. You choose the surface.
Near-term
Background Execution
Kairos-like patterns suggest always-on agents that work while you sleep — delegated jobs, overnight synthesis, continuous monitoring.
Near-term
Multi-Model Routing
Internal codenames suggest the "Claude" brand will route across specialized models optimized for planning, coding, latency, or depth.
Medium-term
Security as Product Surface
Glasswing signals that prompt-injection resilience, permissions, and adversarial evaluation become first-class product features, not afterthoughts.
Medium-term
Agent Operating System
The competitive layer shifts from models to orchestration. The companies that win won't just have better AI — they'll have better systems around the AI.
18 — In the Works

What I've Been Building

These concepts aren't theoretical. Here's the ecosystem of tools I'm building that put these ideas into practice — from agent orchestration to knowledge management to native apps.

6
Active projects
Swift
Native macOS apps
Next.js
Electron + Web
100%
AI-assisted builds
🦗
Hive
AI agent orchestrator with live Kanban, multi-model routing, SSH MCP, and autonomous roadmap.
🌳
Canopy
AI-powered SSH client + Git client for macOS. Terminal, SFTP, server dashboard, BYOK AI chat.
🧠
Brainpower
Personal knowledge vault app with AI-powered hybrid search, vector embeddings, and 3-tier cloud sync.
🌿
Glade
Lightweight macOS app for fast application switching via global hotkey with a customizable launcher panel.
📡
Beam
Mac-native presentation viewer. Opens .html, .js, .pdf, .md files with dark/light mode and clean rendering.
📚
Brain
Local markdown vault (~/brain) — the knowledge layer that all projects and AI agents read and write.
19 — Hive

AI Agent Orchestration Platform

Hive is an Electron + Next.js desktop app that autonomously dispatches Claude Code agents, streams progress in real-time, and manages approvals for risky operations. Think of it as a team of AI developers you manage via a Kanban board.

Hive Kanban board showing AI agent tasks across Backlog, In Progress, Review, and Done columns
Hive Kanban — AI agents managed like a development team
Core Architecture
Electron shell wrapping Next.js at localhost:4000

WebSocket live updates for real-time Kanban dashboard

Scheduler dispatching up to 4 concurrent agents across different projects

SQLite database — no external DB needed

PID registry + recovery for agent process management
👥
11 Agent Profiles
Developer Researcher CEO / Visionary CTO COO CFO Data Analyst Marketing Founder Trader General

Each profile has custom system prompts, allowed tools, and can delegate subtasks to other profiles (CEO → COO → Developer chain).
Task Lifecycle
Backlog
Assigned
In Progress
Review
Done
Approval gates for risky operations (git push, rm, docker). Review flow with Approve / Request Changes / Reject. Retry failed tasks with feedback loop. 50-turn max per agent. 30-min approval timeout (auto-deny).
19b — Hive

Models, Usage & Analytics

📊
Multi-Model Support
Opus Sonnet Haiku Gemini Pro Gemini Flash Flash Lite Local / Ollama

Select model per task. Route heavy architecture to Opus, daily coding to Sonnet, lightweight ops to Haiku. Local models via Ollama / LM Studio for cost-free work.
📈
Token Usage & Reporting
Weekly reports — tasks completed, tokens consumed, cost breakdown by profile and project

Token usage trends — bar charts by model (Haiku, Opus, Sonnet, Gemma), cost over time

Cache efficiency — hit/write/uncached rates with trend analysis. 82.8% cache hit rate achieved.
Hive weekly report showing tasks completed, tokens consumed, and cost breakdown
Weekly Report — tasks, tokens, cost breakdown
Hive token usage trends dashboard with cache efficiency analysis
Token Usage Trends & Cache Efficiency
20 — Hive SSH MCP & Ops

Agents That Reach Into Your Servers — With Guardrails

Hive includes an SSH MCP server that gives agents the ability to run commands on remote machines. But the real design point isn't access — it's restriction. The MCP server itself defines the security boundary: what an LLM can and cannot do, enforced at the protocol layer, not by hoping the model behaves.

🔌
SSH MCP Server
An MCP server built into Hive that lets agents run SSH commands on remote hosts. Agents discover available servers with mcp__ssh__list_hosts, then execute commands via mcp__ssh__exec.

Example: An ops task says "tell me the uptime on digibot" — the agent SSHs into the server and returns the result. No human interaction needed.
💻
Real Output — Agent Running an SSH Task
$ ssh digibot uptime
up 178 days, 3:29 — 3 users
load average: 1.28, 1.16, 1.16

Running strong at 178 days.
Load averages are moderate and
stable across 1/5/15 min windows.

Agent used Bash → SSH → parsed output → summarized. Total time: 1m 56s including agent reasoning.

🛠
Hive MCP Tool API
The Hive MCP server exposes 13 tools for external AI agents to manage the entire task system programmatically:

hive_list_tasks hive_create_task hive_get_task hive_update_task hive_approve_task hive_request_changes hive_move_task hive_delete_task hive_retry_task hive_add_comment hive_kill_agent hive_list_projects hive_get_agent_status

This means Claude Code running locally can create, monitor, and manage Hive tasks — an agent orchestrating agents.
20b — Hive SSH MCP

Skills, Security & the Agent-Native Principle

💻
Hive Running Skills
Hive agents can invoke Claude Code skills directly. In the screenshot, an ops task runs /import-vodafone — a custom skill that triggers a Python data pipeline to pull SIM Inventory reports from the Vodafone M2M Portal and sync them to MongoDB. The agent handles the entire workflow: skill invocation, script execution, error handling, and reporting back.

Pattern: Define your ops workflows as skills. Point Hive at them. Walk away.
Hive agent executing an SSH uptime check on a remote server via MCP
Agent SSH task — checking uptime via MCP
Hive agent invoking the import-vodafone skill to run a data pipeline
Running /import-vodafone skill from Hive
🛡
Security by Design: The MCP as Guardrail
The SSH MCP server isn't just a convenience layer — it's a security architecture decision. Instead of giving an LLM raw shell access and hoping prompt instructions prevent misuse, the MCP server enforces restrictions at the protocol level:

  • Allowlisted commands — the server decides which operations agents can invoke, not the model
  • Scoped host access — agents only see servers explicitly registered; no lateral movement
  • Audit trail — every command execution is logged with agent identity, timestamp, and full output
  • No credential exposure — SSH keys and passwords live in the server process, never passed to the LLM context
The broader principle: This design pattern extends to everything we build. We originally designed systems for humans — dashboards, search engines, data pipelines. Now agents are the primary operators: searching the internet, collecting data, making recommendations. Every interface needs a parallel agent-safe path with explicit permissions, rate limits, and structured outputs. Design for agents, not just humans. The MCP server is the template.
21 — Hive Autonomous Mode

The Autonomous Roadmap

Hive's next evolution: Autonomous Mode — a goal-driven execution layer where the CEO agent decomposes a high-level objective, the COO plans operationally, and specialists execute in parallel. Informed by Paperclip, Hermes Agent, and OpenMOSS.

Mission Lifecycle
Strategy
CEO agent
Planning
COO agent
Execution
Specialists
Review
COO validates
Retro
CEO learns
🎯
Goal Ancestry (from Paperclip)
Every task carries a chain of reasoning from mission → strategy → plan → task. Specialists understand why they're doing what they're doing, not just what.

{
  "goalAncestry": [
    "Make the digital dashboard a more commercially competitive product",
    "Increase ad revenue by optimizing load time",
    "Audit bundle size, find largest deps"
  ]
}
🔨
DAG Scheduler
Tasks form a directed acyclic graph with dependency edges. Independent tasks execute in parallel waves; dependent tasks chain sequentially.

Wave 1: Audit, Research, Pull bugs (parallel)
Wave 2: Optimize, Build, Fix (each depends on Wave 1)
Wave 3: Launch email (depends on Wave 2)

Replaces the current FIFO scheduler for mission tasks.
21b — Hive Autonomous Mode

Budget, Failure Recovery & the Vision

💰
Budget Cascade
Mission-level budget ($25 default) cascades to phases and individual tasks. Scheduler checks budgetSpentUsd < budgetUsd before spawning. If budget exhausted → pause mission, notify human.
🚨
Failure Escalation (from Hermes)
Three-tier recovery: Retry (same agent, fresh session, max 2) → Replan (COO decomposes differently) → Escalate (pause, notify human with full context).
Hive Autonomous Mode design document in Brainpower showing competitive landscape analysis
Autonomous Mode design doc — research in Brainpower
Hive new task creation form with agent profile selection and model routing
Task creation — profile, model, and project selection
The vision: "Make the digital dashboard a more commercially competitive product" → CEO produces strategy with 3 goals → COO builds task DAG with 7 tasks across 3 parallel waves → specialists execute, auto-review, rework if needed → COO validates against success criteria → CEO logs lessons learned. Human checkpoints at strategy and planning phases.
22 — Canopy

AI-Powered SSH & Git Client for macOS

Canopy is a native macOS app that combines SSH terminal, SFTP file management, Git client, and AI chat into a single workspace. Built with SwiftUI, zero external SDK dependencies, BYOK (Bring Your Own Key) for any AI provider.

🖥
Terminal & SSH
Multi-tab SSH terminal powered by Citadel (PTY) + SwiftTerm

Local terminal via SwiftTerm LocalProcess

Server dashboard — uptime, disk, memory, CPU at a glance

SFTP file comparison & push-to-remote
🌳
Git & AI
Full Git client — repo scanning, branches, diff, history, local state

BYOK AI chat panel with terminal context injection

Supports: Claude, OpenAI, OpenRouter, Ollama, LM Studio — all via URLSession REST + SSE, no SDKs

Xcode-style toolbar toggles: Dashboard, Terminal, Git, Files, AI
Tech Stack
LayerTechnologyPurpose
UISwiftUINative macOS 15+, NavigationSplitView
SSHCitadelSSH, PTY, SFTP — pure Swift
TerminalSwiftTermTerminal emulation (NSView + UIView)
SyntaxHighlightSwiftCode syntax highlighting
AuthmacOS KeychainServer passwords + API keys
BuildSPM (Swift 6.0)No Xcode project, pure Package.swift
22b — Canopy

Human Interface, Agent Interface — Same Servers

Canopy macOS app showing SSH terminal, file browser, and AI assistant panel
Canopy — SSH terminal, file browser, and AI panel in a single workspace
Why this matters: Canopy demonstrates what Hive's SSH MCP does from the agent side — Canopy does it from the human side. Same servers, same credentials, two interfaces: one for humans, one for AI agents. This is the agent-native design principle in action.
23 — Brainpower & the Brain

Your Knowledge, Your Search, Your AI

Brainpower is Karpathy's "LLM Wiki" concept made real — a native macOS app that gives you a window into a local markdown vault with AI-powered hybrid search, vector embeddings, and a 3-tier cloud evolution. Inspired by Karpathy's knowledge architecture and Nate B Jones' "One Brain" philosophy.

The 3-Tier Brain Architecture
L1: Local
~/brain — offline
L2: BrainCloud
Personal Atlas
L3: BrainMerge
Team knowledge
L1 (Local): Plain markdown files in ~/brain. Brainpower's built-in embeddings + ripgrep. Always works offline.
L2 (BrainCloud): Personal MongoDB Atlas cluster with vector embeddings. Atlas Vector Search + Atlas Search for cloud-powered semantic search.
L3 (BrainMerge): Shared Atlas cluster. Multi-tenant team knowledge — everyone's notes combined. Shared AI learns from all of it.
🔍
Hybrid Search
Keyword search via ripgrep for exact matches

Semantic search via Ollama embeddings (768d nomic / up to 4096d qwen3) + vDSP dot product

Reciprocal Rank Fusion merges both result sets

AI synthesis via Claude API — search results compiled into coherent answers with citations
🛠
App Features
Vault browser — sections, tags, filter bar, real-time search

Markdown editor with Mermaid diagram support

PDF export and AirDrop to iPhone

File watcher — auto-reloads when files change on disk

Tag system — color-coded, filterable
23b — Brainpower

Atlas Vector Search & the Brain in Action

Atlas Vector Search (BrainCloud)
MongoDB Atlas stores vector embeddings alongside the source documents — no separate vector database. Supports $vectorSearch (ANN/ENN), $rankFusion (MongoDB 8.0+), and $scoreFusion (MongoDB 8.2+) for native hybrid search. Pre-filtering on metadata (tags, section, date) narrows the search space before vector comparison.

Free tier (M0): 512MB storage, 1 vector index — sufficient for a personal Brain.
Brainpower vault browser showing document list with tags, sections, and filter bar
Brainpower vault — documents, tags, and search
Brainpower showing Brain Vector Search architecture document with MongoDB Atlas flow diagram
Brain Vector Search architecture in Brainpower
Brain Vector Search architecture diagram showing MongoDB Atlas integration with local Brain vault
Brain Vector Search — MongoDB Atlas Architecture
The connection: This is the practical implementation of two ideas from this talk — Karpathy's "RAG without RAG" wiki approach (L1), extended with proper vector search (L2), and federated team knowledge (L3). The AI reads and writes the Brain. You own it.
24 — Glade & Beam

The Supporting Cast

Not every tool is an AI orchestrator. Some are small, sharp utilities that solve one problem well.

🌿
Glade — App Launcher
A lightweight macOS menubar utility for fast application switching. Trigger with a global hotkey → customizable panel of your most-used apps appears → click to switch.

Built with: SwiftUI, macOS 15+, SPM, zero dependencies

Architecture: CGEventTap for global keyboard monitoring, PopupWindowController for UI, PersistenceService for saved app list

Features: Hotkey-triggered popup, optional labels, Mission Control integration, onboarding wizard, settings panel
📡
Beam — Presentation Viewer
A Mac-native app that opens presentation files with a clean, focused interface. Drop in an .html, .js, .pdf, or .md file and Beam renders it beautifully.

Built with: SwiftUI, macOS 15+, SPM, zero dependencies

Features: Dark / Light / System theme toggle, drag-and-drop file open, clean minimal UI

Why it exists: Every HTML presentation in this series is designed to be viewed in Beam — the app I built to present them.
Beam macOS app for presenting HTML, JS, PDF, and Markdown files
Beam — present any document beautifully
macOS dock showing suite of custom-built apps: Canopy, Hive, Brainpower, and more
The full suite — all built with Claude Code
The common thread: Every one of these apps is a native SwiftUI app built with Swift Package Manager (no Xcode project files), targeting macOS 15+, using Swift 6.0 toolchain with v5 language mode. Zero external SDK dependencies where possible. All built with Claude Code assistance.
25 — Key Takeaways

What to Remember: Leadership & Developers

For Leadership
  • The moat is the harness, not the model. Orchestration, permissions, and tool routing are the real competitive surface.
  • Agent infrastructure is now a managed service. Build-vs-buy calculus shifted with Managed Agents at $0.08/hour.
  • Security posture is visible. After the leak, operational competence is part of vendor trust evaluation.
  • Memory ownership is a strategic decision. Conway keeps it. Karpathy's approach lets you keep it. Choose deliberately.
For Developers
  • Invest in CLAUDE.md now. It's the highest-ROI file in your repo. Add to it every time Claude makes a mistake.
  • Learn hooks and skills. Skills extend capability, hooks enforce safety. Together they compound over time.
  • Manage context aggressively. One task per session, /clear often, dump progress to files.
  • Use MCP integrations. Connect your tools directly instead of copy-pasting between interfaces.
26 — Key Takeaways

Five Things to Do This Week

Start Today
  1. Create a CLAUDE.md in your project root (start with 50–100 lines of project context and behavioral rules)
  2. Add one pre-commit hook to block sensitive file commits — this enables unattended operation
  3. Try the plan-then-execute workflow on your next feature — draft plan, annotate, iterate, then implement
  4. Start a brain/ directory — dump research into raw/, let AI compile it into structured markdown you own
  5. Connect one external tool via MCP (claude mcp add) — Notion, Figma, database, whatever you copy-paste from most
“The field moves fast. The leak accelerated that. What matters is building systems around AI that compound your team's capability over time.”