AI Cost Attribution: Know Where Every Dollar Goes

Your AI costs increased 40% last month. Quick: can you explain why?

For most organizations, the answer is no. They know total spend. They might know spend by provider. But they can't answer the questions that actually matter:

Which agents are most expensive?
Which actions cost the most?
Which customers drive the most AI spend?
Where are we wasting money?

This is the cost attribution problem. And solving it starts with observability.

The Attribution Challenge

AI costs are inherently distributed:

flowchart TD A[AI Spend $10,000/month] --> B[OpenAI $6,000] A --> C[Anthropic $3,000] A --> D[Other $1,000] B --> B1[???] C --> C1[???] D --> D1[???]

Provider invoices tell you which APIs you called. They don't tell you why.

To attribute costs meaningfully, you need to connect API calls to:

The agent that made them
The action being performed
The business context (customer, workflow, use case)
The outcome (success, failure, value delivered)

Capturing Cost at the Source

Every agent action should capture cost data:

{
  "actor": { "name": "Analysis Agent v2.3" },
  "verb": { "id": "analyzed" },
  "object": { "id": "customer-report-892" },
  "result": {
    "success": true,
    "extensions": {
      "cost": {
        "total_usd": 0.45,
        "breakdown": {
          "input_tokens": 12500,
          "output_tokens": 3200,
          "model": "gpt-4-turbo",
          "provider": "openai"
        }
      },
      "duration_ms": 4200
    }
  },
  "context": {
    "extensions": {
      "customer_id": "enterprise-127",
      "workflow": "monthly-analysis",
      "triggered_by": "schedule"
    }
  }
}

This single statement captures:

What happened (analysis completed)
What it cost ($0.45)
Why it cost that (12.5k input tokens, 3.2k output tokens)
Business context (which customer, which workflow)

Five Dimensions of Cost Attribution

1. By Agent

Which agents are most expensive?

Agent	Actions/Month	Cost/Action	Total Cost
Analysis Agent	45,000	$0.38	$17,100
Support Agent	892,000	$0.012	$10,704
Research Agent	23,000	$0.42	$9,660
Routing Agent	1,200,000	$0.002	$2,400

The Analysis Agent costs more per action but handles fewer requests. The Support Agent is cheap per action but volume drives total spend.

2. By Action Type

What are you paying for?

pie title Cost by Action Type "Analysis" : 35 "Response Generation" : 28 "Research" : 18 "Classification" : 12 "Other" : 7

If analysis is 35% of spend, optimizing analysis prompts has 5x the impact of optimizing classification.

3. By Customer

Which customers drive the most AI cost?

This matters for pricing, resource allocation, and unit economics. If enterprise customers cost 10x more to serve but only pay 3x more, you have a margin problem.

{
  "aggregation": "by_customer",
  "period": "2025-02",
  "data": [
    { "customer": "enterprise-127", "cost": 4200, "actions": 8500 },
    { "customer": "enterprise-089", "cost": 3800, "actions": 7200 },
    { "customer": "startup-442", "cost": 890, "actions": 12000 }
  ]
}

Notice: startup-442 has more actions but lower cost. Their use case is cheaper to serve.

4. By Outcome

Are you paying for success or failure?

{
  "outcome_costs": {
    "successful_actions": {
      "count": 892000,
      "cost": 38000,
      "cost_per": 0.043
    },
    "failed_actions": {
      "count": 45000,
      "cost": 8500,
      "cost_per": 0.189
    },
    "retried_actions": {
      "count": 23000,
      "cost": 4200,
      "cost_per": 0.183
    }
  }
}

Failed and retried actions cost 4x more than successful ones. Reducing failures directly reduces cost.

5. By Time

When does spend occur?

xychart-beta title "Hourly AI Spend" x-axis [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23] y-axis "Cost ($)" 0 --> 500 bar [50, 30, 20, 15, 10, 25, 80, 200, 380, 420, 450, 470, 460, 440, 420, 400, 350, 280, 180, 120, 90, 70, 60, 55]

If 70% of spend happens during business hours, batch processing at night could reduce peak costs.

Cost Optimization Strategies

With proper attribution, optimization becomes systematic:

Strategy 1: Right-Size Models

Track cost and quality by model:

Task	GPT-4 Cost	GPT-3.5 Cost	Quality Delta
Classification	$0.08	$0.003	-2% accuracy
Analysis	$0.45	$0.02	-15% quality
Simple Response	$0.12	$0.005	-1% quality

For classification and simple responses, the cheaper model is 96-99% as good. For analysis, the premium model is worth it.

Strategy 2: Prompt Optimization

Input tokens often dominate cost. Track prompt length by action:

{
  "prompt_analysis": {
    "action": "customer_response",
    "avg_input_tokens": 2800,
    "avg_output_tokens": 450,
    "token_ratio": 6.2,
    "cost_breakdown": {
      "input": 0.084,
      "output": 0.018,
      "total": 0.102
    }
  }
}

A 6:1 input-to-output ratio suggests prompt bloat. Reducing input tokens by 30% cuts this action's cost by 25%.

Strategy 3: Caching

Identify repeated queries:

{
  "cache_opportunity": {
    "action": "product_lookup",
    "daily_volume": 45000,
    "unique_queries": 1200,
    "cache_hit_potential": 0.97,
    "current_cost": 4500,
    "potential_cost": 135,
    "savings": 4365
  }
}

97% of product lookups are redundant. Caching could save $4,365/day.

Strategy 4: Batch Processing

Some actions don't need real-time processing:

{
  "batch_candidates": [
    {
      "action": "daily_report",
      "current_mode": "realtime",
      "latency_requirement": "< 4 hours",
      "batch_savings": 0.35
    },
    {
      "action": "sentiment_analysis",
      "current_mode": "realtime",
      "latency_requirement": "< 1 minute",
      "batch_savings": 0.05
    }
  ]
}

Daily reports can batch for 35% savings. Sentiment analysis needs real-time.

Building a Cost Dashboard

Essential cost visibility includes:

Total spend trend - Are we growing, stable, or declining?
Spend by agent - Which agents cost most?
Cost per action - Is efficiency improving?
Cost by customer tier - Are unit economics healthy?
Anomaly detection - Are there unexpected spikes?

flowchart LR A[Action Stream] --> B[Cost Calculator] B --> C[Attribution Engine] C --> D[Time Series DB] D --> E[Dashboard] D --> F[Alerts]

The Empress Approach

Empress automatically captures cost data for every action:

Token counts by model
Provider-specific pricing
Custom cost dimensions
Real-time cost tracking
Anomaly detection and alerts

You see not just what you spend, but where and why.

AI costs shouldn't be a mystery. With proper attribution, they become another optimization lever.