Agent in the Loop
56 skills, quality gates, and a learning loop — how autonomous agents work without me
Claude Code · Multi-Agent · MCP · Hooks · Skills
It's 11:47 PM. My laptop reports: 3 new DevProcess tickets. I glance: Severity Low, UI bugs. I type: 'Work autonomously on the open tickets.' Then I go to sleep.
The Dream of Autonomous Development
Human in the Loop
- Agent waits for every 'yes'
- 38% frustration overhead
- 31% automatable messages
- Only 6% real decisions
- 4 hours time-to-ship
Agent in the Loop
- Agent decides by risk level
- Zero overhead at level 1-3
- Skill chains without interruption
- Human only at level 7-8
- Under 1 hour time-to-ship
The Autonomous Dev Team
Level 1-3
Code style, commits, tests. Agent decides alone.
Level 4-6
Architecture, API design. Consults Knowledge Backbone.
Level 7-8
Breaking changes, security, production. Asks the human.
Skill Chains: Ticket In, PR Out
'Implement PROJ-456 completely.' The orchestrator detects the workflow type. Reads the JIRA ticket via MCP. Queries the KB agent for context. Creates a feature branch. Implements. Self-review. Browser test. PR. 'PR ready for review.'
In between: 5 chained skills, 2 MCP queries, 4 quality gates, and zero questions to me. That's the endless loop: ticket in, PR out.
The Knowledge Backbone
The heart: a semantic knowledge store with temporal knowledge management. Every decision, every correction flows back. Confidence decay lets outdated knowledge fade. Conflict detection catches contradictions.
“When I correct the agent, the learning loop recognizes the correction and stores it. Next time, it won't make the same mistake.”
Reliability: Circuit Breaker for Autonomy
State Machine
Workflow states persistently tracked. No step is lost.
Circuit Breaker
JIRA down? Cache kicks in. Skill failed? Alternative approach.
Task Persistence
Every task survives crashes and session switches.
Confidence Tracking
Decisions with confidence scores. Below threshold: escalation.
Three attempts. Read source code before fixing. Only then escalation to me. Autonomy without reliability is dangerous.
The Numbers
Before
- 10 manual triggers per feature
- 5 corrections
- 3 context rebuilds
- 4 hours time-to-ship
After
- 2-3 triggers
- 1-2 corrections
- 0 context rebuilds
- < 1 hour time-to-ship
These aren't marketing numbers. They're real metrics from my daily work across 5 parallel projects. The point isn't perfection — it's direction.
What Comes Next
The endless loop is never done. Retro automation: after every sprint, the system analyzes its own performance. Which skills were slow? Which corrections piled up?
“The goal: I don't optimize the system. The system optimizes itself. And I remain the conductor who writes the score.”
— Philipp
Autonomous dev teams for your organization? Let's talk about agent-in-the-loop.
Book a consultation