Your AI costs increased 40% last month. Quick: can you explain why?
For most organizations, the answer is no. They know total spend. They might know spend by provider. But they can't answer the questions that actually matter:
- Which agents are most expensive?
- Which actions cost the most?
- Which customers drive the most AI spend?
- Where are we wasting money?
This is the cost attribution problem. And solving it starts with observability.
The Attribution Challenge
AI costs are inherently distributed:
Provider invoices tell you which APIs you called. They don't tell you why.
To attribute costs meaningfully, you need to connect API calls to:
- The agent that made them
- The action being performed
- The business context (customer, workflow, use case)
- The outcome (success, failure, value delivered)
Capturing Cost at the Source
Every agent action should capture cost data:
{
"actor": { "name": "Analysis Agent v2.3" },
"verb": { "id": "analyzed" },
"object": { "id": "customer-report-892" },
"result": {
"success": true,
"extensions": {
"cost": {
"total_usd": 0.45,
"breakdown": {
"input_tokens": 12500,
"output_tokens": 3200,
"model": "gpt-4-turbo",
"provider": "openai"
}
},
"duration_ms": 4200
}
},
"context": {
"extensions": {
"customer_id": "enterprise-127",
"workflow": "monthly-analysis",
"triggered_by": "schedule"
}
}
}
This single statement captures:
- What happened (analysis completed)
- What it cost ($0.45)
- Why it cost that (12.5k input tokens, 3.2k output tokens)
- Business context (which customer, which workflow)
Five Dimensions of Cost Attribution
1. By Agent
Which agents are most expensive?
| Agent | Actions/Month | Cost/Action | Total Cost |
|---|---|---|---|
| Analysis Agent | 45,000 | $0.38 | $17,100 |
| Support Agent | 892,000 | $0.012 | $10,704 |
| Research Agent | 23,000 | $0.42 | $9,660 |
| Routing Agent | 1,200,000 | $0.002 | $2,400 |
The Analysis Agent costs more per action but handles fewer requests. The Support Agent is cheap per action but volume drives total spend.
2. By Action Type
What are you paying for?
If analysis is 35% of spend, optimizing analysis prompts has 5x the impact of optimizing classification.
3. By Customer
Which customers drive the most AI cost?
This matters for pricing, resource allocation, and unit economics. If enterprise customers cost 10x more to serve but only pay 3x more, you have a margin problem.
{
"aggregation": "by_customer",
"period": "2025-02",
"data": [
{ "customer": "enterprise-127", "cost": 4200, "actions": 8500 },
{ "customer": "enterprise-089", "cost": 3800, "actions": 7200 },
{ "customer": "startup-442", "cost": 890, "actions": 12000 }
]
}
Notice: startup-442 has more actions but lower cost. Their use case is cheaper to serve.
4. By Outcome
Are you paying for success or failure?
{
"outcome_costs": {
"successful_actions": {
"count": 892000,
"cost": 38000,
"cost_per": 0.043
},
"failed_actions": {
"count": 45000,
"cost": 8500,
"cost_per": 0.189
},
"retried_actions": {
"count": 23000,
"cost": 4200,
"cost_per": 0.183
}
}
}
Failed and retried actions cost 4x more than successful ones. Reducing failures directly reduces cost.
5. By Time
When does spend occur?
If 70% of spend happens during business hours, batch processing at night could reduce peak costs.
Cost Optimization Strategies
With proper attribution, optimization becomes systematic:
Strategy 1: Right-Size Models
Track cost and quality by model:
| Task | GPT-4 Cost | GPT-3.5 Cost | Quality Delta |
|---|---|---|---|
| Classification | $0.08 | $0.003 | -2% accuracy |
| Analysis | $0.45 | $0.02 | -15% quality |
| Simple Response | $0.12 | $0.005 | -1% quality |
For classification and simple responses, the cheaper model is 96-99% as good. For analysis, the premium model is worth it.
Strategy 2: Prompt Optimization
Input tokens often dominate cost. Track prompt length by action:
{
"prompt_analysis": {
"action": "customer_response",
"avg_input_tokens": 2800,
"avg_output_tokens": 450,
"token_ratio": 6.2,
"cost_breakdown": {
"input": 0.084,
"output": 0.018,
"total": 0.102
}
}
}
A 6:1 input-to-output ratio suggests prompt bloat. Reducing input tokens by 30% cuts this action's cost by 25%.
Strategy 3: Caching
Identify repeated queries:
{
"cache_opportunity": {
"action": "product_lookup",
"daily_volume": 45000,
"unique_queries": 1200,
"cache_hit_potential": 0.97,
"current_cost": 4500,
"potential_cost": 135,
"savings": 4365
}
}
97% of product lookups are redundant. Caching could save $4,365/day.
Strategy 4: Batch Processing
Some actions don't need real-time processing:
{
"batch_candidates": [
{
"action": "daily_report",
"current_mode": "realtime",
"latency_requirement": "< 4 hours",
"batch_savings": 0.35
},
{
"action": "sentiment_analysis",
"current_mode": "realtime",
"latency_requirement": "< 1 minute",
"batch_savings": 0.05
}
]
}
Daily reports can batch for 35% savings. Sentiment analysis needs real-time.
Building a Cost Dashboard
Essential cost visibility includes:
- Total spend trend - Are we growing, stable, or declining?
- Spend by agent - Which agents cost most?
- Cost per action - Is efficiency improving?
- Cost by customer tier - Are unit economics healthy?
- Anomaly detection - Are there unexpected spikes?
The Empress Approach
Empress automatically captures cost data for every action:
- Token counts by model
- Provider-specific pricing
- Custom cost dimensions
- Real-time cost tracking
- Anomaly detection and alerts
You see not just what you spend, but where and why.
AI costs shouldn't be a mystery. With proper attribution, they become another optimization lever.