More Than a Chatbot
How I built a Breathing Agent that observes, understands, and acts on its own
AG-UI · A2A · MCP · React · FastAPI · Bedrock
Imagine you're sitting in a recording studio. Next to you sits an experienced audiobook producer. She's always there. She's mostly quiet. When you're working focused and everything's going well, she says nothing. But you know she's there. And when she sees something — a repetition, a style break, an error — she speaks. Direct, constructive, never condescending.
The Problem with Classic Agents
Classic Agent
- Waits in the next room
- User must call
- Chat panel as prison
- No context between pages
- Either on or off
Breathing Agent (Aria)
- Sits next to you
- Observes and acts autonomously
- Lives in the application
- Persisted context everywhere
- 4 fluid breathing states
Every AI application I know does it the same way: a chat window, an input field, a button. That's not an assistant — that's a service desk. With AudioLoom, I wanted something different. The question was never: Where do we place the agent? The question was: How does a competent colleague behave?
Aria Breathes
REST
Everything's running. Only her avatar with a gentle pulse. Like a calm breath.
ATTENTIVE
She notices something. A toast at the edge — no layout shift, just a concrete hint.
IN CONVERSATION
The user addresses her. Dialog opens. Full context, persisted history.
ORCHESTRATING
'Do it.' Fields fill, pages switch, progress bars run. Everything interruptible.
Navigation and Aria breathe counter to each other. Aria expands → Navigation collapses to icons. 300ms transition. Net space loss: 60 pixels. Nobody notices.
5 Paradoxes of a Good Assistant
“Rules don't make a good assistant. The tensions between rules make a good assistant.”
Confident ↔ Humble
Strong enough to disagree — gracious enough to be wrong.
Proactive ↔ Restrained
Helps before you ask — never annoys.
Autonomous ↔ Transparent
Works independently — gives back control anytime.
Honest ↔ Diplomatic
Names problems — protects the creative process.
Competent ↔ Learning
When Aria admits being wrong, her next disagreement gains weight.
The Conductor Mode
'I have a PDF with my novel. Make a 5-part audiobook series. Noir crime, Hamburg, first person.' What happens: Aria reads, analyzes, creates the series, builds the bible, generates 5 episodes — all visible, all interruptible. The user hasn't touched a single form. They spoke, Aria worked.
This isn't a separate mode. It's Aria responding to a verbal instruction instead of waiting for form input.
How Aria Controls the Page: AG-UI
Aria isn't an isolated chat agent. She controls the application directly via the AG-UI protocol. 'Create a new episode' → Aria navigates to the page, opens the dialog, fills the fields, confirms. The user sees every step. Aria doesn't work in the background — she works before your eyes.

Three Layers of Seeing
Tier 1: Browser
Word repetitions, empty required fields, ACX metric violations. Instant, free.
Tier 2: Server
Cross-episode consistency, bible alignment. Debounced, 5 seconds after last change.
Tier 3: LLM
Style analysis, plot holes, genre feedback. Only at phase completions or on request.
The user experiences Aria not as omniscient, but as attentive. Some things she sees immediately, others take a moment. Like a human.
Why This Works
“The best AI interaction doesn't feel like AI. It feels like a good colleague.”

The Breathing Agent philosophy isn't an AudioLoom feature. It's a design philosophy for every product with an AI assistant. The core question always remains: not where do I place the agent — but how does a person behave in the room.
— Philipp
Breathing Agent for your product? Let's talk about AI assistants that feel natural.
Book a consultation