Log Levels for AI Agents: Beyond DEBUG, INFO, WARN, ERROR

Every developer knows the classic log levels:

DEBUG
INFO
WARN
ERROR
FATAL

They've worked for decades. They don't work for AI agents.

Why Traditional Levels Fail

Traditional log levels answer: "How severe is this?"

For AI agents, severity isn't the right question. An agent might make a high-confidence decision that's completely wrong. It might log a routine INFO message about a catastrophic choice.

The levels don't capture what matters: What kind of thing happened? Should I care? Can I act on it?

A New Framework: The DISCO Model

flowchart TD A[Agent Event] --> B{Event Type} B --> C[Decision] B --> D[Intervention] B --> E[State Change] B --> F[Completion] B --> G[Observation]

D - Decision

The agent chose between alternatives.

{
  "level": "DECISION",
  "agent": "routing-agent",
  "event": "Selected fulfillment center",
  "options": ["warehouse_east", "warehouse_west", "warehouse_central"],
  "choice": "warehouse_west",
  "confidence": 0.82,
  "reasoning": "Closest to destination with available inventory"
}

When to use: Any time the agent had options and picked one.

Always log: Yes. Decisions are always signal.

I - Intervention

A human overrode or corrected the agent.

{
  "level": "INTERVENTION",
  "agent": "approval-agent",
  "event": "Human override",
  "original_decision": "deny",
  "override_decision": "approve",
  "override_by": "user_admin_jane",
  "reason": "VIP customer exception"
}

When to use: Human-in-the-loop corrections, manual approvals, policy overrides.

Always log: Absolutely. This is gold for training and compliance.

S - State Change

Something meaningful changed in the system.

{
  "level": "STATE_CHANGE",
  "agent": "inventory-agent",
  "event": "Stock level updated",
  "entity": "sku_12345",
  "previous": 150,
  "current": 12,
  "trigger": "Large order fulfilled"
}

When to use: Inventory changes, status transitions, threshold crossings.

Log selectively: Only state changes that matter. Not every counter increment.

C - Completion

A task or workflow finished.

{
  "level": "COMPLETION",
  "agent": "onboarding-agent",
  "event": "Customer onboarding complete",
  "customer_id": "cust_xyz",
  "duration_minutes": 23,
  "steps_completed": 7,
  "outcome": "success"
}

When to use: Task finished, workflow ended, goal achieved (or failed).

Always log: Yes. Outcomes are essential for measuring agent performance.

O - Observation

Something notable that isn't a decision, intervention, state change, or completion.

{
  "level": "OBSERVATION",
  "agent": "monitoring-agent",
  "event": "Anomaly detected",
  "metric": "api_latency",
  "expected": "120ms",
  "actual": "890ms",
  "significance": "high"
}

When to use: Anomalies, threshold alerts, notable patterns.

Log selectively: Only observations that might require attention.

Mapping to Action

Each DISCO level maps to a clear action:

Level	Default Action	Review Frequency
DECISION	Monitor for patterns	Weekly analysis
INTERVENTION	Learn from feedback	Each occurrence
STATE_CHANGE	Verify consistency	As needed
COMPLETION	Measure performance	Daily metrics
OBSERVATION	Investigate if significant	When alerted

Filtering by Level

Unlike traditional levels where you filter by severity (show ERROR and above), DISCO levels filter by type:

# Show all decisions (for reasoning analysis)
empress.query(level="DECISION")

# Show all interventions (for training data)
empress.query(level="INTERVENTION")

# Show failures only
empress.query(level="COMPLETION", outcome="failure")

# Show high-significance observations
empress.query(level="OBSERVATION", significance="high")

Comparison

Traditional	Problem	DISCO Equivalent
DEBUG	Too much noise	Don't log (or OBSERVATION with low significance)
INFO	Doesn't distinguish types	DECISION, COMPLETION, or STATE_CHANGE
WARN	Severity-focused	OBSERVATION with medium significance
ERROR	Missing context	COMPLETION with failure outcome
FATAL	Rare for agents	COMPLETION with critical failure

Implementation

In TypeScript

import { empress } from '@empress/sdk';

// Instead of console.log or traditional logging
empress.decision({
  agent: 'pricing-agent',
  event: 'Price calculated',
  options: [...],
  choice: selectedPrice,
  confidence: 0.91
});

empress.completion({
  agent: 'pricing-agent',
  event: 'Quote generated',
  outcome: 'success',
  metrics: { quote_value: 15000 }
});

empress.intervention({
  agent: 'pricing-agent',
  event: 'Manual discount applied',
  original: calculatedPrice,
  override: discountedPrice,
  by: currentUser.id
});

In Python

from empress import log

# Decisions
log.decision(
    agent="support-agent",
    event="Ticket categorized",
    options=["billing", "technical", "general"],
    choice="technical",
    confidence=0.88
)

# Completions
log.completion(
    agent="support-agent",
    event="Ticket resolved",
    outcome="success",
    duration_minutes=12
)

# Interventions
log.intervention(
    agent="support-agent",
    event="Escalation overridden",
    original_decision="escalate",
    override_decision="resolve",
    by="agent_sarah"
)

The Payoff

Teams using DISCO instead of traditional log levels report:

3x faster debugging because events are categorized by type, not severity
90% reduction in noise because DEBUG-level spam is eliminated
Better training data because interventions are explicitly captured
Clearer compliance because decisions are first-class citizens

Traditional log levels were designed for traditional software.

AI agents deserve a framework designed for AI agents.