Blog
April 05, 2026Agentic Coding7 min

Agent in the Loop

56 skills, quality gates, and a learning loop — how autonomous agents work without me

Claude Code · Multi-Agent · MCP · Hooks · Skills

It's 11:47 PM. My laptop reports: 3 new DevProcess tickets. I glance: Severity Low, UI bugs. I type: 'Work autonomously on the open tickets.' Then I go to sleep.

The Dream of Autonomous Development

Human in the Loop

  • Agent waits for every 'yes'
  • 38% frustration overhead
  • 31% automatable messages
  • Only 6% real decisions
  • 4 hours time-to-ship

Agent in the Loop

  • Agent decides by risk level
  • Zero overhead at level 1-3
  • Skill chains without interruption
  • Human only at level 7-8
  • Under 1 hour time-to-ship
38%of my messages were frustration overhead31% fully automatable · Only 6% real strategic decisions

The Autonomous Dev Team

1
56 SkillsChainable
2
4 Hook TypesQuality Gates
3
2 MCP ServersJIRA + KB
4
MemoryPersistent

Level 1-3

Code style, commits, tests. Agent decides alone.

Level 4-6

Architecture, API design. Consults Knowledge Backbone.

Level 7-8

Breaking changes, security, production. Asks the human.

Skill Chains: Ticket In, PR Out

'Implement PROJ-456 completely.' The orchestrator detects the workflow type. Reads the JIRA ticket via MCP. Queries the KB agent for context. Creates a feature branch. Implements. Self-review. Browser test. PR. 'PR ready for review.'

In between: 5 chained skills, 2 MCP queries, 4 quality gates, and zero questions to me. That's the endless loop: ticket in, PR out.

The Knowledge Backbone

The heart: a semantic knowledge store with temporal knowledge management. Every decision, every correction flows back. Confidence decay lets outdated knowledge fade. Conflict detection catches contradictions.

When I correct the agent, the learning loop recognizes the correction and stores it. Next time, it won't make the same mistake.

Reliability: Circuit Breaker for Autonomy

State Machine

Workflow states persistently tracked. No step is lost.

Circuit Breaker

JIRA down? Cache kicks in. Skill failed? Alternative approach.

Task Persistence

Every task survives crashes and session switches.

Confidence Tracking

Decisions with confidence scores. Below threshold: escalation.

Three attempts. Read source code before fixing. Only then escalation to me. Autonomy without reliability is dangerous.

The Numbers

Before

  • 10 manual triggers per feature
  • 5 corrections
  • 3 context rebuilds
  • 4 hours time-to-ship

After

  • 2-3 triggers
  • 1-2 corrections
  • 0 context rebuilds
  • < 1 hour time-to-ship

These aren't marketing numbers. They're real metrics from my daily work across 5 parallel projects. The point isn't perfection — it's direction.

What Comes Next

The endless loop is never done. Retro automation: after every sprint, the system analyzes its own performance. Which skills were slow? Which corrections piled up?

The goal: I don't optimize the system. The system optimizes itself. And I remain the conductor who writes the score.

— Philipp

Autonomous dev teams for your organization? Let's talk about agent-in-the-loop.

Book a consultation