The code leak, Project Glasswing, Managed Agents, and practical techniques for getting more out of Claude today.
Irv Cassio • AI Enthusiasts Group • April 2026
01 — Timeline
Two Weeks That Changed the AI Landscape
From an accidental source map to a $100M security initiative, the last two weeks of March–April 2026 revealed more about Claude's architecture than any official announcement.
March 11, 2026
Bun Bug Reported
A known Bun issue surfaces: source maps are served in production builds even when docs say they shouldn't be. The bug sits open for 20 days.
March 24, 2026
Claude Cowork Ships
Anthropic launches Claude Computer Use for macOS — a desktop agent that can control apps, navigate browsers, and handle multi-step workflows.
March 31, 2026
The Leak
Claude Code npm package v2.1.88 ships with a 59.8 MB source map. Within hours: 1,906 files, 512,000+ lines of TypeScript are public. GitHub mirrors fork tens of thousands of times.
April 1–3, 2026
Containment Fails
Anthropic issues mass GitHub takedowns, later admits they impacted more repos than intended. Rewrites and ports in Python and Rust keep the architecture public.
April 7, 2026
Project Glasswing Announced
Anthropic reveals Claude Mythos Preview — a restricted frontier model given to 50+ organizations for defensive cybersecurity. $100M in credits committed.
April 8, 2026
Managed Agents Launch
Claude Managed Agents enters public beta: cloud-hosted agent infrastructure with sandboxed execution, checkpointing, and scoped permissions.
02 — The Leak
What Was Actually Exposed
The source map wasn't just code — it was the most detailed blueprint of a production AI agent system ever made public.
Known CVEs — CVE-2025-59536 and CVE-2026-21852 now easier to weaponize
Key insight: The leak proved that the competitive moat is not the model — it's the orchestration, permissions, and tool routing wrapped around it. The harness is the product.
03 — Under the Hood
The Agent Architecture Blueprint
What the leak revealed is a five-layer agent execution system that goes far beyond a chatbot interface.
Entrypoint
→
Query Engine
→
Tool Registry
→
Execution
→
Verification
Technology Stack
Layer
Technology
Purpose
Runtime
Bun
Fast JS runtime for agent execution
Language
TypeScript
Full type safety across 512K lines
UI
React + Ink
Terminal UI rendering
Validation
Zod
Schema validation for tool inputs/outputs
Auth
OAuth 2.0 + JWT
Authentication + macOS Keychain
Telemetry
OpenTelemetry
Tracing, metrics, user frustration tracking
Notable finding: The codebase includes frustration detection — Claude tracks signals of user frustration to adjust its behavior. This was flagged by Scientific American as a privacy concern.
04 — Impact
What the Leak Means in Practice
The impact extends across security, competition, and the broader AI ecosystem.
⚠
Security
Pre-existing CVEs now far easier to exploit. Permission bypass logic is public. Attackers can craft targeted malicious repos that abuse previously unknown entry points.
⚖
Competition
Every AI lab now has a detailed reference implementation. The orchestration patterns, tool routing, and memory architecture are no longer trade secrets. Open-source ports appeared within days.
🌱
Ecosystem
Accelerated adoption and understanding. Developers now build on known architecture instead of guessing. Public scrutiny likely accelerates security patching. Community engagement surged.
19M
Views on initial X post
46K+
GitHub stars on mirrors
3+
Language ports (Python, Rust)
2
Known CVEs now exposed
Brand reality check: Two accidental code exposures within five days undermine the "safety-first" narrative central to Anthropic's market positioning. Operational competence is now part of the trust equation.
05 — Project Glasswing
Claude Mythos & Defensive Security
Days after the leak, Anthropic announced its most ambitious security initiative. Coincidence or crisis response — the move is significant either way.
🛡
What Is Mythos Preview
An unreleased frontier model restricted to defensive security work. Not publicly available — given to selected organizations to find and fix vulnerabilities before broader release.
Mythos autonomously found a 17-year-old RCE vulnerability in FreeBSD (CVE-2026-4747) with no human involvement after the initial request.
Anthropic says Mythos identified thousands of zero-day vulnerabilities across every major OS and browser.
$100M+ in usage credits committed, plus $4M in donations to open-source security organizations.
The signal: Anthropic is positioning AI models as security infrastructure, not just productivity tools. The restricted release model — defend first, release later — is new for the industry and sets a precedent.
06 — Managed Agents
From Prototype to Production in Days
Announced April 8, 2026, Claude Managed Agents is Anthropic's answer to the hardest part of building AI agents: the infrastructure.
What Managed Agents Handles for You
Sandboxed execution — isolated container per agent
Checkpointing — resume after failures
Credential management — secure secrets handling
Scoped permissions — control tool access
End-to-end tracing — full observability
Error recovery — auto-resume after outages
💰
Pricing
Model usage cost + $0.08 per agent runtime hour. That's the infrastructure premium for not building your own sandboxing, state management, and credential storage.
🏢
Early Adopters
NotionRakutenAsana
Teams using Managed Agents across coding, task automation, and document processing workflows.
Research Preview Features
Multi-agent spawning — Complex tasks split across specialized sub-agents Auto prompt refinement — Improved task success by up to 10 points in internal testing
Developer takeaway: If the leak showed the harness is the moat, Managed Agents is Anthropic selling access to that harness. You get the orchestration without building it.
07 — The Platform Wave
Claude Is Becoming an Operating Layer
Zoom out and the picture is clear: Anthropic shipped five major capabilities in six weeks. This is a platform play, not incremental updates.
Date
Release
What It Means
Feb 5
Opus 4.6
1M context window, agent teams, 300K max output tokens on Batches API
Mythos Preview for defensive security with 50+ partner organizations
Apr 8
Managed Agents
Cloud-hosted agent infrastructure as a service
Model Opus / Sonnet 4.6
→
Desktop Cowork + Dispatch
→
Cloud Managed Agents
→
Security Mythos / Glasswing
The pattern: Claude isn't improving as a chatbot. It's expanding into a runtime — a layer between you and your computer, your cloud, your infrastructure, and your security posture.
08 — Internal Signals
Codenames and What's Likely Coming
The leak and subsequent analysis surfaced internal labels and feature flags. Some are confirmed, some remain speculative.
✅
Confirmed or Shipped
Mythos Preview — Active in Project Glasswing, named partners, live security work
Managed Agents — Public beta with API, pricing, and early adopters
Cowork / Computer Use — GA for macOS, Windows coming
🔬
Unconfirmed / Speculative
CapybaraFennecNumbatTengu
Internal model family codenames — likely tiered development tracks
Reality check: Feature flags are not launch promises. Internal experiments may never ship. Treat codenames as directional signals, not product roadmap commitments.
09 — The Real Moat
Conway: The Layer Above MCP
The leak revealed something most analysis missed: Anthropic is building a proprietary orchestration layer on top of the open MCP standard. This is the classic platform play — and it's called Conway.
The Two-Layer Strategy
Open Layer
MCP (Model Context Protocol)
Open standard donated to Linux Foundation
Adopted by OpenAI, Google, and others
Portable tool connectors across platforms
Standardized AI-to-tool communication
Purpose: Create adoption. Build the ecosystem.
Proprietary Layer
Conway (CNW)
Always-on agent runtime with persistent memory
Custom extensions in .cnw.zip format
Webhook triggers, event streams, scheduling
UI panels: Search, Chat, System controls
Purpose: Create lock-in. Capture value.
MCP Open • Portable
→
Conway Proprietary • Persistent
→
CNW Extensions App Store • Locked In
🔒
The Behavioral Lock-In Problem
The deepest moat isn't data portability — it's behavioral lock-in. Once Conway runs continuously for months, it accumulates decision patterns, workflow preferences, inferred business rules, and edge cases it's learned to handle.
There is no export format for an agent's learned operational intelligence. No regulatory framework for migrating it. Switching means retraining — potentially months of ramp-up cost.
This is the moat. Not the model weights. Not MCP. The layer between the open standard and your daily operations.
Think of it like this: MCP is like USB — a universal connector anyone can use. Conway is like iOS — the operating system that makes USB useful, but locks you into Apple's ecosystem. The connector is open. The experience layer is not.
10a — What This Means to You
Perspectives: The New User & the Executive
The same events read very differently depending on where you sit.
🌱
If You're New to AI
The barrier to entry just dropped. The leak means the architecture of how AI agents work is no longer a mystery. Learning materials exploded — the blueprint is public.
Start with Claude Code + CLAUDE.md. Write a project description file, install Claude Code, and start asking it to help with real work.
Expect cost changes. The OpenClaw ban shows flat-rate AI access is unsustainable. Budget for API costs or stay within official tool limits.
Pick a lane. Open-source tools (Aider, Cline, OpenCode) vs proprietary platforms (Claude Code, Conway). Your choice now shapes switching costs later.
🏢
If You're a CIO or CEO
Vendor trust just got more complex. Evaluate AI vendors on operational competence, not just model capability.
The Conway lock-in is real. Agents that run 24/7 accumulate institutional knowledge that doesn't port. Insist on platform-independent documentation.
Managed Agents changes the build-vs-buy math. At $0.08/hour, many internal agent platforms no longer justify their engineering cost.
Token economics hit the P&L directly. Model routing can cut AI costs 60–80%. This needs to be a line-item strategy.
10b — What This Means to You
Perspectives: The Developer & the Founder
For builders and founders, the last two weeks reshaped both the opportunity and the risk landscape.
💻
If You're a Developer
The leaked architecture is your study guide. Five-layer agent systems, tool routing, permission models, memory management — the reference implementation is public.
Build on MCP, be cautious with CNW. MCP integrations are portable. Conway extensions are not. Invest in the open layer first.
Skills and hooks are the productivity multiplier. Not the model. Not the prompt. The system you build around the model compounds over time.
Understand the cost model. A single OpenClaw instance can burn $1K–$5K/day. Use the Batch API (50% off) for non-urgent work. Route intelligently.
🚀
If You're a Startup Founder
The "wrapper" startup is dead. The "harness" startup is alive. The value is in orchestration, permissions, domain-specific workflows, and the memory layer.
Managed Agents is both opportunity and threat. Easier to launch agent-powered products, but your infra moat just evaporated. Differentiate on domain knowledge.
Watch the CNW extension ecosystem. Early extensions could be high-value real estate — like early iOS apps. But Anthropic controls your distribution.
Multi-model routing is a survival skill. Abstract your model layer, route by task complexity, and keep your options open.
11 — Token Economics
The OpenClaw Lesson & the Cost of Intelligence
The OpenClaw ban isn't just policy drama. It's the first clear signal that flat-rate AI access cannot survive agent-scale usage — and every organization needs a token strategy.
🚫
What Happened with OpenClaw
April 4, 2026: Anthropic bans all third-party agent frameworks (OpenClaw, OpenCode, etc.) from using Claude subscription OAuth tokens.
Why: ~60% of active OpenClaw instances ran on Claude subscription credits. A single instance can consume $1,000–$5,000/day — on a $20–$200/month plan.
Boris Cherny:"Our subscriptions weren't built for the usage patterns of these third-party tools."
Result: Users must now pay API rates or stay within Claude Code's managed limits. OpenClaw creator (now at OpenAI) called it “a betrayal of open-source developers.”
⚖
The Bigger Pattern
This mirrors what happened across the industry:
OpenAI's model router — automatically sends simple requests to GPT-5.4 nano (cheapest) and complex ones to GPT-5.4 (most capable). Users don't choose; the system optimizes for cost.
The lesson: Every AI provider is moving from "unlimited" to "optimized." Revenue and profitability now directly shape what model answers your question.
Token Cost Comparison (per 1M tokens, 2026)
Model
Input
Output
Best For
Haiku 4.5
$1
$5
Classification, extraction, routing
Sonnet 4.6
$3
$15
General coding, analysis, writing
Opus 4.6
$5
$25
Complex reasoning, architecture, planning
Batch API
50% discount
Non-urgent processing within 24hr window
Your token strategy: Route simple tasks to Haiku ($1/M), daily work to Sonnet ($3/M), complex architecture to Opus ($5/M), and batch non-urgent work for 50% off. Organizations using model routing report 30–70% cost reductions while maintaining quality. This isn't optimization — it's a requirement.
12 — Practical Application
Making Claude Code Work Harder for You
Theory is nice. Here's what actually moves the needle in daily development work.
📝
CLAUDE.md — Your Agent's Operating System
The most impactful single file you can create. It loads every session and tells Claude how your world works.
Boris Cherny (Claude Code's creator) keeps his at ~100 lines, ~2,500 tokens. His golden rule: "Anytime we see Claude do something incorrectly, we add it to CLAUDE.md so it doesn't repeat next time."
# Example CLAUDE.md structure
## Project Context
- Stack: Next.js 16, React 19, MongoDB Atlas
- Deploy: Vercel, production branch is main
## Behavioral Rules
- Run tests after every change
- Never mock the database in integration tests
- Keep files under 150 lines
- Commit early and often with descriptive messages
## Aliases
- "the dashboard" = /src/app/dashboard/
- "deploy" = git push origin main
Pro tip: CLAUDE.md is advisory (~80% compliance). If something must happen every time, make it a hook instead. Hooks are deterministic — 100% execution rate.
13 — Hooks & Skills
The Two Multipliers
Skills extend what Claude can do. Hooks constrain how it does it. Together they turn a powerful but unpredictable assistant into something you can trust with your codebase.
🔌
Hooks
Deterministic scripts that run at specific points in Claude's workflow. Configure in settings.json.
PreToolUse — runs before any tool call (allow / deny / defer) PostToolUse — runs after tool execution PreCommit — gate commits with custom checks
New in v2.1.89:defer option lets hooks pause execution and wait for an external signal.
📚
Skills
Markdown files in ~/.claude/skills/ that give Claude domain knowledge and reusable workflows. No SDK, no build step.
Invoked automatically when relevant, or manually with /skill-name
Examples: research workflows, presentation generators, deployment scripts, analysis tools, code review personas
Real impact: One practitioner reported a 30-page blockchain analysis (15 charts, 40+ SQL queries) completed in one evening that would have taken a full work week manually. Skills were the primary enabler.
14 — Commands & Techniques
The Commands That Actually Matter
A curated list of the techniques and workflows that experienced Claude Code users rely on daily.
Session Management
Command / Technique
What It Does
/clear
Reset context. Start fresh with ~20K tokens instead of degrading at 60%+ usage.
/compact
Compress context without losing everything. Good for mid-task cleanup.
claude -p "prompt"
Non-interactive mode. Use in CI pipelines, pre-commit hooks, or automated scripts.
--output-format stream-json
Streaming JSON output for programmatic consumption.
Workflow Patterns
Pattern
How It Works
Plan-then-execute
Ask Claude to draft a plan with no implementation. Annotate in editor. Send back. Repeat until solid. Then: "implement."
One task per session
Fresh context costs ~20K tokens. Quality loss from a degraded session costs much more. Dump plan to a file, /clear, reload.
Subagent delegation
Define specialist personas in .claude/agents/. Claude spawns them in isolated context windows and gets compressed summaries back.
MCP integration
Use claude mcp add to connect Notion, Figma, databases, monitoring. Claude queries them directly instead of copy-pasting.
Keyboard Shortcuts
Esc — Cancel current generation
Tab — Accept autocomplete suggestion
Ctrl+C — Interrupt and get partial result
!command — Run shell command in session
/ — Browse available slash commands
@file — Add file to context
The golden rule: CLAUDE.md for project context, skills for specialized workflows, hooks for safety guarantees, and /clear liberally. Context quality beats context quantity every time.
15 — Context Economics
Managing the 200K Token Window
The leak confirmed what practitioners already knew: context management is the single biggest factor in output quality.
~20K
Tokens per fresh session
20–40%
Quality starts degrading
60%+
Noticeable quality loss
1M
Opus 4.6 context window
✅
Do This
Start fresh sessions for each distinct task
Use CLAUDE.md for persistent context (loads automatically)
Use skills for specialized knowledge (loads on demand)
Dump progress to a file before /clear
Use @file to pull in specific files, not entire directories
❌
Avoid This
Packing the entire codebase into context
Running multi-hour sessions without clearing
Putting volatile info in CLAUDE.md (it loads every time)
Relying on context alone instead of external persistence
Over-specifying CLAUDE.md — Claude ignores rules lost in noise
16a — Memory Is the Moat
Karpathy's LLM Wiki: RAG Without RAG
On April 3, 2026 — two days after the Claude Code leak — Andrej Karpathy published something quietly more important: a knowledge architecture that replaces RAG with a living markdown wiki maintained by the AI itself.
🧠
The Core Idea
Karpathy stopped using AI primarily for code. He's using it to build a second brain — a system where the LLM acts as a full-time librarian.
No vector databases. No embedding pipelines. Just markdown files and an LLM that reads, writes, and maintains them. ~100 articles, ~400K words — with minimal direct human intervention.
The Three-Folder Architecture
raw/ Dump everything
→
wiki/ LLM compiles & links
→
index Navigate & query
1. Ingest
Research papers, repos, web articles go into raw/. Obsidian Web Clipper converts pages to .md with local images for vision models.
2. Compile
The LLM writes a structured wiki: summaries, concepts, encyclopedia-style articles, and backlinks between ideas. This is the step RAG skips.
3. Maintain
The LLM runs "health checks" — linting for inconsistencies, missing data, or new connections. The wiki evolves autonomously.
❌
Traditional RAG
Vector embeddings are a black box
Retrieval noise increases with scale
Requires embedding model + vector DB + pipeline
Knowledge is implicit in vectors
✅
LLM Wiki (Karpathy)
Markdown is human-readable and traceable
Navigation via summaries and index pages
Zero infrastructure: just files and an LLM
Knowledge is explicit, editable, deletable
16b — Memory Is the Moat
Why Memory Systems Are the Real Differentiator
The Claude Code leak, Conway's behavioral lock-in, and Karpathy's wiki all point to the same conclusion.
💡
Three Memory Architectures, One Pattern
Claude Code'sCLAUDE.md + skills + auto-memory = a primitive brain system that compounds per-project knowledge across sessions
Conway's persistent agent memory = institutional knowledge that creates behavioral lock-in (the platform keeps it)
Karpathy's wiki = personal knowledge that stays with you, not the platform (you keep it)
The pattern is clear: The people and organizations getting the best results from AI aren't writing better prompts. They're building better memory systems — structured, persistent, human-readable knowledge bases that make every AI interaction smarter than the last.
🛠
Start Today
Create a brain/ or knowledge/ directory
Dump research, notes, and articles into raw/
Let your AI compile it into structured, interlinked markdown
Review and edit — the human stays in the loop
Over weeks, you'll have a second brain that you own, not your AI vendor
Ownership matters: Conway's memory creates lock-in because the platform keeps it. Karpathy's wiki creates leverage because you keep it. The decision about where your knowledge lives is one of the most consequential choices in AI adoption.
17 — What's Next
Where This Is All Heading
Reading the confirmed announcements and the leaked signals together, the trajectory is clear.
Now
Interactive Agent + Cloud Infrastructure
Claude Code, Cowork, and Managed Agents form a three-layer agent platform: CLI, desktop, and cloud. You choose the surface.
Near-term
Background Execution
Kairos-like patterns suggest always-on agents that work while you sleep — delegated jobs, overnight synthesis, continuous monitoring.
Near-term
Multi-Model Routing
Internal codenames suggest the "Claude" brand will route across specialized models optimized for planning, coding, latency, or depth.
Medium-term
Security as Product Surface
Glasswing signals that prompt-injection resilience, permissions, and adversarial evaluation become first-class product features, not afterthoughts.
Medium-term
Agent Operating System
The competitive layer shifts from models to orchestration. The companies that win won't just have better AI — they'll have better systems around the AI.
18 — In the Works
What I've Been Building
These concepts aren't theoretical. Here's the ecosystem of tools I'm building that put these ideas into practice — from agent orchestration to knowledge management to native apps.
6
Active projects
Swift
Native macOS apps
Next.js
Electron + Web
100%
AI-assisted builds
🦗
Hive
AI agent orchestrator with live Kanban, multi-model routing, SSH MCP, and autonomous roadmap.
🌳
Canopy
AI-powered SSH client + Git client for macOS. Terminal, SFTP, server dashboard, BYOK AI chat.
🧠
Brainpower
Personal knowledge vault app with AI-powered hybrid search, vector embeddings, and 3-tier cloud sync.
🌿
Glade
Lightweight macOS app for fast application switching via global hotkey with a customizable launcher panel.
📡
Beam
Mac-native presentation viewer. Opens .html, .js, .pdf, .md files with dark/light mode and clean rendering.
📚
Brain
Local markdown vault (~/brain) — the knowledge layer that all projects and AI agents read and write.
19 — Hive
AI Agent Orchestration Platform
Hive is an Electron + Next.js desktop app that autonomously dispatches Claude Code agents, streams progress in real-time, and manages approvals for risky operations. Think of it as a team of AI developers you manage via a Kanban board.
Hive Kanban — AI agents managed like a development team
⚙
Core Architecture
Electron shell wrapping Next.js at localhost:4000
WebSocket live updates for real-time Kanban dashboard
Scheduler dispatching up to 4 concurrent agents across different projects
SQLite database — no external DB needed
PID registry + recovery for agent process management
Select model per task. Route heavy architecture to Opus, daily coding to Sonnet, lightweight ops to Haiku. Local models via Ollama / LM Studio for cost-free work.
📈
Token Usage & Reporting
Weekly reports — tasks completed, tokens consumed, cost breakdown by profile and project
Token usage trends — bar charts by model (Haiku, Opus, Sonnet, Gemma), cost over time
Cache efficiency — hit/write/uncached rates with trend analysis. 82.8% cache hit rate achieved.
Weekly Report — tasks, tokens, cost breakdown
Token Usage Trends & Cache Efficiency
20 — Hive SSH MCP & Ops
Agents That Reach Into Your Servers — With Guardrails
Hive includes an SSH MCP server that gives agents the ability to run commands on remote machines. But the real design point isn't access — it's restriction. The MCP server itself defines the security boundary: what an LLM can and cannot do, enforced at the protocol layer, not by hoping the model behaves.
🔌
SSH MCP Server
An MCP server built into Hive that lets agents run SSH commands on remote hosts. Agents discover available servers with mcp__ssh__list_hosts, then execute commands via mcp__ssh__exec.
Example: An ops task says "tell me the uptime on digibot" — the agent SSHs into the server and returns the result. No human interaction needed.
💻
Real Output — Agent Running an SSH Task
$ ssh digibot uptime
up 178 days, 3:29 — 3 users
load average: 1.28, 1.16, 1.16
Running strong at 178 days.
Load averages are moderate and
stable across 1/5/15 min windows.
Agent used Bash → SSH → parsed output → summarized. Total time: 1m 56s including agent reasoning.
🛠
Hive MCP Tool API
The Hive MCP server exposes 13 tools for external AI agents to manage the entire task system programmatically:
This means Claude Code running locally can create, monitor, and manage Hive tasks — an agent orchestrating agents.
20b — Hive SSH MCP
Skills, Security & the Agent-Native Principle
💻
Hive Running Skills
Hive agents can invoke Claude Code skills directly. In the screenshot, an ops task runs /import-vodafone — a custom skill that triggers a Python data pipeline to pull SIM Inventory reports from the Vodafone M2M Portal and sync them to MongoDB. The agent handles the entire workflow: skill invocation, script execution, error handling, and reporting back.
Pattern: Define your ops workflows as skills. Point Hive at them. Walk away.
Agent SSH task — checking uptime via MCP
Running /import-vodafone skill from Hive
🛡
Security by Design: The MCP as Guardrail
The SSH MCP server isn't just a convenience layer — it's a security architecture decision. Instead of giving an LLM raw shell access and hoping prompt instructions prevent misuse, the MCP server enforces restrictions at the protocol level:
Allowlisted commands — the server decides which operations agents can invoke, not the model
Scoped host access — agents only see servers explicitly registered; no lateral movement
Audit trail — every command execution is logged with agent identity, timestamp, and full output
No credential exposure — SSH keys and passwords live in the server process, never passed to the LLM context
The broader principle: This design pattern extends to everything we build. We originally designed systems for humans — dashboards, search engines, data pipelines. Now agents are the primary operators: searching the internet, collecting data, making recommendations. Every interface needs a parallel agent-safe path with explicit permissions, rate limits, and structured outputs. Design for agents, not just humans. The MCP server is the template.
21 — Hive Autonomous Mode
The Autonomous Roadmap
Hive's next evolution: Autonomous Mode — a goal-driven execution layer where the CEO agent decomposes a high-level objective, the COO plans operationally, and specialists execute in parallel. Informed by Paperclip, Hermes Agent, and OpenMOSS.
Mission Lifecycle
Strategy CEO agent
→
Planning COO agent
→
Execution Specialists
→
Review COO validates
→
Retro CEO learns
🎯
Goal Ancestry (from Paperclip)
Every task carries a chain of reasoning from mission → strategy → plan → task. Specialists understand why they're doing what they're doing, not just what.
{
"goalAncestry": [
"Make the digital dashboard a more commercially competitive product",
"Increase ad revenue by optimizing load time",
"Audit bundle size, find largest deps"
]
}
🔨
DAG Scheduler
Tasks form a directed acyclic graph with dependency edges. Independent tasks execute in parallel waves; dependent tasks chain sequentially.
Replaces the current FIFO scheduler for mission tasks.
21b — Hive Autonomous Mode
Budget, Failure Recovery & the Vision
💰
Budget Cascade
Mission-level budget ($25 default) cascades to phases and individual tasks. Scheduler checks budgetSpentUsd < budgetUsd before spawning. If budget exhausted → pause mission, notify human.
🚨
Failure Escalation (from Hermes)
Three-tier recovery: Retry (same agent, fresh session, max 2) → Replan (COO decomposes differently) → Escalate (pause, notify human with full context).
Autonomous Mode design doc — research in Brainpower
Task creation — profile, model, and project selection
The vision: "Make the digital dashboard a more commercially competitive product" → CEO produces strategy with 3 goals → COO builds task DAG with 7 tasks across 3 parallel waves → specialists execute, auto-review, rework if needed → COO validates against success criteria → CEO logs lessons learned. Human checkpoints at strategy and planning phases.
22 — Canopy
AI-Powered SSH & Git Client for macOS
Canopy is a native macOS app that combines SSH terminal, SFTP file management, Git client, and AI chat into a single workspace. Built with SwiftUI, zero external SDK dependencies, BYOK (Bring Your Own Key) for any AI provider.
🖥
Terminal & SSH
Multi-tab SSH terminal powered by Citadel (PTY) + SwiftTerm
Local terminal via SwiftTerm LocalProcess
Server dashboard — uptime, disk, memory, CPU at a glance
SFTP file comparison & push-to-remote
🌳
Git & AI
Full Git client — repo scanning, branches, diff, history, local state
BYOK AI chat panel with terminal context injection
Supports: Claude, OpenAI, OpenRouter, Ollama, LM Studio — all via URLSession REST + SSE, no SDKs
Xcode-style toolbar toggles: Dashboard, Terminal, Git, Files, AI
Tech Stack
Layer
Technology
Purpose
UI
SwiftUI
Native macOS 15+, NavigationSplitView
SSH
Citadel
SSH, PTY, SFTP — pure Swift
Terminal
SwiftTerm
Terminal emulation (NSView + UIView)
Syntax
HighlightSwift
Code syntax highlighting
Auth
macOS Keychain
Server passwords + API keys
Build
SPM (Swift 6.0)
No Xcode project, pure Package.swift
22b — Canopy
Human Interface, Agent Interface — Same Servers
Canopy — SSH terminal, file browser, and AI panel in a single workspace
Why this matters: Canopy demonstrates what Hive's SSH MCP does from the agent side — Canopy does it from the human side. Same servers, same credentials, two interfaces: one for humans, one for AI agents. This is the agent-native design principle in action.
23 — Brainpower & the Brain
Your Knowledge, Your Search, Your AI
Brainpower is Karpathy's "LLM Wiki" concept made real — a native macOS app that gives you a window into a local markdown vault with AI-powered hybrid search, vector embeddings, and a 3-tier cloud evolution. Inspired by Karpathy's knowledge architecture and Nate B Jones' "One Brain" philosophy.
The 3-Tier Brain Architecture
L1: Local ~/brain — offline
→
L2: BrainCloud Personal Atlas
→
L3: BrainMerge Team knowledge
L1 (Local): Plain markdown files in ~/brain. Brainpower's built-in embeddings + ripgrep. Always works offline. L2 (BrainCloud): Personal MongoDB Atlas cluster with vector embeddings. Atlas Vector Search + Atlas Search for cloud-powered semantic search. L3 (BrainMerge): Shared Atlas cluster. Multi-tenant team knowledge — everyone's notes combined. Shared AI learns from all of it.
🔍
Hybrid Search
Keyword search via ripgrep for exact matches
Semantic search via Ollama embeddings (768d nomic / up to 4096d qwen3) + vDSP dot product
Reciprocal Rank Fusion merges both result sets
AI synthesis via Claude API — search results compiled into coherent answers with citations
🛠
App Features
Vault browser — sections, tags, filter bar, real-time search
Markdown editor with Mermaid diagram support
PDF export and AirDrop to iPhone
File watcher — auto-reloads when files change on disk
Tag system — color-coded, filterable
23b — Brainpower
Atlas Vector Search & the Brain in Action
Atlas Vector Search (BrainCloud)
MongoDB Atlas stores vector embeddings alongside the source documents — no separate vector database. Supports $vectorSearch (ANN/ENN), $rankFusion (MongoDB 8.0+), and $scoreFusion (MongoDB 8.2+) for native hybrid search. Pre-filtering on metadata (tags, section, date) narrows the search space before vector comparison.
Free tier (M0): 512MB storage, 1 vector index — sufficient for a personal Brain.
Brainpower vault — documents, tags, and search
Brain Vector Search architecture in Brainpower
Brain Vector Search — MongoDB Atlas Architecture
The connection: This is the practical implementation of two ideas from this talk — Karpathy's "RAG without RAG" wiki approach (L1), extended with proper vector search (L2), and federated team knowledge (L3). The AI reads and writes the Brain. You own it.
24 — Glade & Beam
The Supporting Cast
Not every tool is an AI orchestrator. Some are small, sharp utilities that solve one problem well.
🌿
Glade — App Launcher
A lightweight macOS menubar utility for fast application switching. Trigger with a global hotkey → customizable panel of your most-used apps appears → click to switch.
Built with: SwiftUI, macOS 15+, SPM, zero dependencies
Architecture: CGEventTap for global keyboard monitoring, PopupWindowController for UI, PersistenceService for saved app list
A Mac-native app that opens presentation files with a clean, focused interface. Drop in an .html, .js, .pdf, or .md file and Beam renders it beautifully.
Built with: SwiftUI, macOS 15+, SPM, zero dependencies
Features: Dark / Light / System theme toggle, drag-and-drop file open, clean minimal UI
Why it exists: Every HTML presentation in this series is designed to be viewed in Beam — the app I built to present them.
Beam — present any document beautifully
The full suite — all built with Claude Code
The common thread: Every one of these apps is a native SwiftUI app built with Swift Package Manager (no Xcode project files), targeting macOS 15+, using Swift 6.0 toolchain with v5 language mode. Zero external SDK dependencies where possible. All built with Claude Code assistance.
25 — Key Takeaways
What to Remember: Leadership & Developers
For Leadership
The moat is the harness, not the model. Orchestration, permissions, and tool routing are the real competitive surface.
Agent infrastructure is now a managed service. Build-vs-buy calculus shifted with Managed Agents at $0.08/hour.
Security posture is visible. After the leak, operational competence is part of vendor trust evaluation.
Memory ownership is a strategic decision. Conway keeps it. Karpathy's approach lets you keep it. Choose deliberately.
For Developers
Invest in CLAUDE.md now. It's the highest-ROI file in your repo. Add to it every time Claude makes a mistake.
Learn hooks and skills. Skills extend capability, hooks enforce safety. Together they compound over time.
Manage context aggressively. One task per session, /clear often, dump progress to files.
Use MCP integrations. Connect your tools directly instead of copy-pasting between interfaces.
26 — Key Takeaways
Five Things to Do This Week
Start Today
Create a CLAUDE.md in your project root (start with 50–100 lines of project context and behavioral rules)
Add one pre-commit hook to block sensitive file commits — this enables unattended operation
Try the plan-then-execute workflow on your next feature — draft plan, annotate, iterate, then implement
Start a brain/ directory — dump research into raw/, let AI compile it into structured markdown you own
Connect one external tool via MCP (claude mcp add) — Notion, Figma, database, whatever you copy-paste from most
“The field moves fast. The leak accelerated that. What matters is building systems around AI that compound your team's capability over time.”