Every developer knows the classic log levels:
- DEBUG
- INFO
- WARN
- ERROR
- FATAL
They've worked for decades. They don't work for AI agents.
Why Traditional Levels Fail
Traditional log levels answer: "How severe is this?"
For AI agents, severity isn't the right question. An agent might make a high-confidence decision that's completely wrong. It might log a routine INFO message about a catastrophic choice.
The levels don't capture what matters: What kind of thing happened? Should I care? Can I act on it?
A New Framework: The DISCO Model
D - Decision
The agent chose between alternatives.
{
"level": "DECISION",
"agent": "routing-agent",
"event": "Selected fulfillment center",
"options": ["warehouse_east", "warehouse_west", "warehouse_central"],
"choice": "warehouse_west",
"confidence": 0.82,
"reasoning": "Closest to destination with available inventory"
}
When to use: Any time the agent had options and picked one.
Always log: Yes. Decisions are always signal.
I - Intervention
A human overrode or corrected the agent.
{
"level": "INTERVENTION",
"agent": "approval-agent",
"event": "Human override",
"original_decision": "deny",
"override_decision": "approve",
"override_by": "user_admin_jane",
"reason": "VIP customer exception"
}
When to use: Human-in-the-loop corrections, manual approvals, policy overrides.
Always log: Absolutely. This is gold for training and compliance.
S - State Change
Something meaningful changed in the system.
{
"level": "STATE_CHANGE",
"agent": "inventory-agent",
"event": "Stock level updated",
"entity": "sku_12345",
"previous": 150,
"current": 12,
"trigger": "Large order fulfilled"
}
When to use: Inventory changes, status transitions, threshold crossings.
Log selectively: Only state changes that matter. Not every counter increment.
C - Completion
A task or workflow finished.
{
"level": "COMPLETION",
"agent": "onboarding-agent",
"event": "Customer onboarding complete",
"customer_id": "cust_xyz",
"duration_minutes": 23,
"steps_completed": 7,
"outcome": "success"
}
When to use: Task finished, workflow ended, goal achieved (or failed).
Always log: Yes. Outcomes are essential for measuring agent performance.
O - Observation
Something notable that isn't a decision, intervention, state change, or completion.
{
"level": "OBSERVATION",
"agent": "monitoring-agent",
"event": "Anomaly detected",
"metric": "api_latency",
"expected": "120ms",
"actual": "890ms",
"significance": "high"
}
When to use: Anomalies, threshold alerts, notable patterns.
Log selectively: Only observations that might require attention.
Mapping to Action
Each DISCO level maps to a clear action:
| Level | Default Action | Review Frequency |
|---|---|---|
| DECISION | Monitor for patterns | Weekly analysis |
| INTERVENTION | Learn from feedback | Each occurrence |
| STATE_CHANGE | Verify consistency | As needed |
| COMPLETION | Measure performance | Daily metrics |
| OBSERVATION | Investigate if significant | When alerted |
Filtering by Level
Unlike traditional levels where you filter by severity (show ERROR and above), DISCO levels filter by type:
# Show all decisions (for reasoning analysis)
empress.query(level="DECISION")
# Show all interventions (for training data)
empress.query(level="INTERVENTION")
# Show failures only
empress.query(level="COMPLETION", outcome="failure")
# Show high-significance observations
empress.query(level="OBSERVATION", significance="high")
Comparison
| Traditional | Problem | DISCO Equivalent |
|---|---|---|
| DEBUG | Too much noise | Don't log (or OBSERVATION with low significance) |
| INFO | Doesn't distinguish types | DECISION, COMPLETION, or STATE_CHANGE |
| WARN | Severity-focused | OBSERVATION with medium significance |
| ERROR | Missing context | COMPLETION with failure outcome |
| FATAL | Rare for agents | COMPLETION with critical failure |
Implementation
In TypeScript
import { empress } from '@empress/sdk';
// Instead of console.log or traditional logging
empress.decision({
agent: 'pricing-agent',
event: 'Price calculated',
options: [...],
choice: selectedPrice,
confidence: 0.91
});
empress.completion({
agent: 'pricing-agent',
event: 'Quote generated',
outcome: 'success',
metrics: { quote_value: 15000 }
});
empress.intervention({
agent: 'pricing-agent',
event: 'Manual discount applied',
original: calculatedPrice,
override: discountedPrice,
by: currentUser.id
});
In Python
from empress import log
# Decisions
log.decision(
agent="support-agent",
event="Ticket categorized",
options=["billing", "technical", "general"],
choice="technical",
confidence=0.88
)
# Completions
log.completion(
agent="support-agent",
event="Ticket resolved",
outcome="success",
duration_minutes=12
)
# Interventions
log.intervention(
agent="support-agent",
event="Escalation overridden",
original_decision="escalate",
override_decision="resolve",
by="agent_sarah"
)
The Payoff
Teams using DISCO instead of traditional log levels report:
- 3x faster debugging because events are categorized by type, not severity
- 90% reduction in noise because DEBUG-level spam is eliminated
- Better training data because interventions are explicitly captured
- Clearer compliance because decisions are first-class citizens
Traditional log levels were designed for traditional software.
AI agents deserve a framework designed for AI agents.