Private BetaWe're currently in closed beta.Join the waitlist
BlogTechnical
TechnicalFebruary 27, 20254 min read

Log Levels for AI Agents: Beyond DEBUG, INFO, WARN, ERROR

Traditional log levels don't work for AI systems. Here's a better framework designed for autonomous agents.

Empress Team
AI Operations & Observability

Every developer knows the classic log levels:

  • DEBUG
  • INFO
  • WARN
  • ERROR
  • FATAL

They've worked for decades. They don't work for AI agents.

Why Traditional Levels Fail

Traditional log levels answer: "How severe is this?"

For AI agents, severity isn't the right question. An agent might make a high-confidence decision that's completely wrong. It might log a routine INFO message about a catastrophic choice.

The levels don't capture what matters: What kind of thing happened? Should I care? Can I act on it?

A New Framework: The DISCO Model

flowchart TD A[Agent Event] --> B{Event Type} B --> C[Decision] B --> D[Intervention] B --> E[State Change] B --> F[Completion] B --> G[Observation]

D - Decision

The agent chose between alternatives.

{
  "level": "DECISION",
  "agent": "routing-agent",
  "event": "Selected fulfillment center",
  "options": ["warehouse_east", "warehouse_west", "warehouse_central"],
  "choice": "warehouse_west",
  "confidence": 0.82,
  "reasoning": "Closest to destination with available inventory"
}

When to use: Any time the agent had options and picked one.

Always log: Yes. Decisions are always signal.

I - Intervention

A human overrode or corrected the agent.

{
  "level": "INTERVENTION",
  "agent": "approval-agent",
  "event": "Human override",
  "original_decision": "deny",
  "override_decision": "approve",
  "override_by": "user_admin_jane",
  "reason": "VIP customer exception"
}

When to use: Human-in-the-loop corrections, manual approvals, policy overrides.

Always log: Absolutely. This is gold for training and compliance.

S - State Change

Something meaningful changed in the system.

{
  "level": "STATE_CHANGE",
  "agent": "inventory-agent",
  "event": "Stock level updated",
  "entity": "sku_12345",
  "previous": 150,
  "current": 12,
  "trigger": "Large order fulfilled"
}

When to use: Inventory changes, status transitions, threshold crossings.

Log selectively: Only state changes that matter. Not every counter increment.

C - Completion

A task or workflow finished.

{
  "level": "COMPLETION",
  "agent": "onboarding-agent",
  "event": "Customer onboarding complete",
  "customer_id": "cust_xyz",
  "duration_minutes": 23,
  "steps_completed": 7,
  "outcome": "success"
}

When to use: Task finished, workflow ended, goal achieved (or failed).

Always log: Yes. Outcomes are essential for measuring agent performance.

O - Observation

Something notable that isn't a decision, intervention, state change, or completion.

{
  "level": "OBSERVATION",
  "agent": "monitoring-agent",
  "event": "Anomaly detected",
  "metric": "api_latency",
  "expected": "120ms",
  "actual": "890ms",
  "significance": "high"
}

When to use: Anomalies, threshold alerts, notable patterns.

Log selectively: Only observations that might require attention.

Mapping to Action

Each DISCO level maps to a clear action:

Level Default Action Review Frequency
DECISION Monitor for patterns Weekly analysis
INTERVENTION Learn from feedback Each occurrence
STATE_CHANGE Verify consistency As needed
COMPLETION Measure performance Daily metrics
OBSERVATION Investigate if significant When alerted

Filtering by Level

Unlike traditional levels where you filter by severity (show ERROR and above), DISCO levels filter by type:

# Show all decisions (for reasoning analysis)
empress.query(level="DECISION")

# Show all interventions (for training data)
empress.query(level="INTERVENTION")

# Show failures only
empress.query(level="COMPLETION", outcome="failure")

# Show high-significance observations
empress.query(level="OBSERVATION", significance="high")

Comparison

Traditional Problem DISCO Equivalent
DEBUG Too much noise Don't log (or OBSERVATION with low significance)
INFO Doesn't distinguish types DECISION, COMPLETION, or STATE_CHANGE
WARN Severity-focused OBSERVATION with medium significance
ERROR Missing context COMPLETION with failure outcome
FATAL Rare for agents COMPLETION with critical failure

Implementation

In TypeScript

import { empress } from '@empress/sdk';

// Instead of console.log or traditional logging
empress.decision({
  agent: 'pricing-agent',
  event: 'Price calculated',
  options: [...],
  choice: selectedPrice,
  confidence: 0.91
});

empress.completion({
  agent: 'pricing-agent',
  event: 'Quote generated',
  outcome: 'success',
  metrics: { quote_value: 15000 }
});

empress.intervention({
  agent: 'pricing-agent',
  event: 'Manual discount applied',
  original: calculatedPrice,
  override: discountedPrice,
  by: currentUser.id
});

In Python

from empress import log

# Decisions
log.decision(
    agent="support-agent",
    event="Ticket categorized",
    options=["billing", "technical", "general"],
    choice="technical",
    confidence=0.88
)

# Completions
log.completion(
    agent="support-agent",
    event="Ticket resolved",
    outcome="success",
    duration_minutes=12
)

# Interventions
log.intervention(
    agent="support-agent",
    event="Escalation overridden",
    original_decision="escalate",
    override_decision="resolve",
    by="agent_sarah"
)

The Payoff

Teams using DISCO instead of traditional log levels report:

  • 3x faster debugging because events are categorized by type, not severity
  • 90% reduction in noise because DEBUG-level spam is eliminated
  • Better training data because interventions are explicitly captured
  • Clearer compliance because decisions are first-class citizens

Traditional log levels were designed for traditional software.

AI agents deserve a framework designed for AI agents.

Share this article
Now in private beta

Ready to see what your AI agents do?

Complete observability for autonomous systems. One platform for compliance, operations, and intelligence.