The Forge
Why I rebuilt Claude Code — and what I learned about AI tools
I rebuilt Claude Code. Not out of boredom, not out of hubris — but because I hit a wall that everyone knows who seriously works with AI agents: vendor lock-in.
The problem nobody talks about
Claude Code is brilliant. Opus 4 thinks deeper than any other model I know. But Claude Code only runs with Anthropic models. A 370,000 LOC tool that ties me to a single provider.
At the same time, I use Gemini 2.5 Pro daily when I need a million tokens of context. GPT-5 for reasoning tasks. Local models via Ollama when data can't go to the cloud. Every model has strengths. No model can do everything.
Forge: A smithy for software
So I built Forge. An autonomous development platform that optimizes itself, uses any LLM, and is controllable from any device. The name says it all: a forge that smiths software like an experienced team.
The architecture is modular: a Model Router abstracts all providers — Anthropic, Google, OpenAI, local models. Each task gets not the most expensive model, but the most fitting one. Opus for architecture. Gemini for large context. Sonnet for fast iterations.
29 tools, 131 tests, 58% parity
I analyzed the entire Claude Code source — 38 tools, 18+ files just for the agent executor, 1,185 lines just for the Read tool. Then I rebuilt every single tool: file tools, shell tools, agent tools, MCP integration, workflow tools, team tools, IDE tools. 29 tools in production, 131 tests green.
And then I built features Claude Code doesn't have: cron scheduling for recurring tasks. LLM-based context summarization instead of simple truncation. Cost tracking with budgets per session, day, and model. Hook-based tool blocking.
Cross-device: plan on phone, build on desktop
The daily break I wanted to fix: discuss an idea on the phone in Claude.AI. Sit at the desk, rebuild context. Session ends, knowledge lost. Next day: start over.
In Forge, session state lives on the server. Every device is a window into the same work. Phone for planning and reviewing. Desktop for deep work. Tablet for monitoring. Context never gets lost.
Bootstrapping: Forge builds Forge
The most beautiful thing about Forge is the bootstrapping principle. Phase 1: Forge is built with Claude Code. Phase 2: Forge builds with Forge and Claude Code together. Phase 3: Forge builds itself. Every improvement to Forge makes Forge better at improving itself.
I'm currently in Phase 1. 29 tools work. The query engine streams. Agents spawn. But the road to full self-optimization is long. And that's exactly what makes it exciting.
Why this matters for everyone
Forge is a hobby project. But the problem it solves is universal: anyone seriously developing with AI needs access to multiple models. Needs persistence across sessions. Needs quality gates that enforce hard — not prompts that ask nicely. And needs the freedom to choose the best tool for each task.
This isn't theory. This is what I live every day — at Mercedes-Benz and in my own projects. The future of AI development isn't one model, one tool, one provider. It's an orchestra.
— Philipp