💸

Chapter 17 · 10 min read

Cost Optimization & Model Selection

Stop burning $50/day on GPT-4 when $3/day gets you 90% there

Most people run GPT-4o for everything and wonder why their bill is $200/month. The truth? 90% of your agent's tasks don't need the smartest model. This chapter shows you how to cut costs 80% without sacrificing quality.

🍕 Real-life analogy

You don't hire a brain surgeon to take your blood pressure. Different tasks need different expertise levels. Your agent's model selection should work the same way — use the expensive specialist for hard problems, and the fast generalist for everything else.

Why Most People Overspend (and How to Stop)

The #1 cost mistake isn't using the wrong model — it's using the right model for the wrong tasks. Here's what typical spending looks like before optimization:

💸 Typical "I didn't think about costs" Setup

Morning briefing (Opus, daily)~$15/mo

Heartbeat checks (Opus, every 30 min)~$45/mo

Content drafts (Opus, daily)~$12/mo

Social monitoring (Opus, 4x daily)~$20/mo

Chat conversations (Opus, 20 msgs/day)~$60/mo

Total~$152/mo 😱

Now the same setup, optimized with model routing:

✅ Same Tasks, Smart Model Selection

Morning briefing (Sonnet, daily)~$2/mo

Heartbeat checks (Haiku, every 30 min)~$0.50/mo

Content drafts (Sonnet, daily)~$2/mo

Social monitoring (Haiku, 4x daily)~$0.30/mo

Chat conversations (Sonnet, 20 msgs/day)~$5/mo

Total~$10/mo ✨

Same output quality for 93% less cost. The heartbeat doesn't need Opus to check "any new emails?" The social monitor doesn't need Opus to count likes. Match the model to the task complexity, and your bill plummets.

The Context Window Tax

There's a hidden cost most people miss: context window bloat. Every message in a long conversation gets re-sent as context. A 50-message chat with a 10K-token system prompt means you're paying for that system prompt 50 times.

Fix 1: Use Isolated Sessions for Cron Jobs

Each cron job starts fresh — no accumulated history. This alone can cut cron costs by 60-80% compared to running everything in the main session.

Fix 2: Compact Conversations Regularly

When your main session hits 100K tokens, run /compact. This summarizes old messages and frees up context, reducing the per-message cost of future interactions.

Fix 3: Keep System Prompts Lean

A 15K-token AGENTS.md means 15K tokens charged on every single message. Move detailed procedures to knowledge base files that are read on-demand, not loaded every turn.

The Model Tier Strategy

🟢 Tier 1: Fast & Cheap ($0.10-0.50/day)

Use for: simple replies, formatting, classification, routine tasks

• GPT-4o-mini — $0.15/1M input, $0.60/1M output
• Claude 3.5 Haiku — $0.25/1M input, $1.25/1M output
• Gemini Flash — $0.075/1M input, $0.30/1M output

🔵 Tier 2: Smart & Balanced ($1-5/day)

Use for: content writing, analysis, code generation, research synthesis

• Claude Sonnet — $3/1M input, $15/1M output
• GPT-4o — $2.50/1M input, $10/1M output
• Gemini Pro — $1.25/1M input, $5/1M output

🟣 Tier 3: Expert & Expensive ($5-20/day)

Use for: complex reasoning, architecture decisions, strategy, debugging hard problems

• Claude Opus — $15/1M input, $75/1M output
• GPT-4.5 — $75/1M input, $150/1M output
• o1 / o3 — Variable, reasoning-heavy

Task-to-Model Mapping

knowledge/config/model-routing.md

# Model Routing Rules

## Use CHEAP model (Haiku/Flash/4o-mini):
- Formatting text (markdown, JSON conversion)
- Simple classification ("is this urgent?")
- Acknowledging messages ("got it, working on it")
- Heartbeat checks (is anything new?)
- Reading and summarizing short documents

## Use BALANCED model (Sonnet/4o/Gemini Pro):
- Writing content (tweets, posts, newsletters)
- Research synthesis (combining multiple sources)
- Code generation (new features, bug fixes)
- Data analysis (trends, patterns)
- Cron job outputs (daily reports, plans)

## Use EXPERT model (Opus/o3) — sparingly:
- Architecture decisions ("how should I structure this?")
- Debugging complex issues
- Strategy and planning
- Code review for critical systems
- When Sonnet gets it wrong twice

🔌 Platform-Specific Cost Optimization

🐾 OpenClaw

• Set default model to Sonnet in config, override to Opus only for complex tasks
• Use --model per cron job to pick the right tier
• Set contextTokens: 50000 instead of 200k — most tasks don't need huge context
• Use isolated sessions for cron jobs — they start fresh without dragging history
• Run /compact when context exceeds 100k to avoid paying for repeated context

🤖 Claude API

• Use prompt caching — repeated system prompts cost 90% less after first call
• Haiku for preprocessing, Sonnet for main work, Opus only for review
• Set max_tokens to limit output length (don't pay for rambling)
• Batch API: 50% discount for non-time-sensitive tasks (reports, analysis)

💬 ChatGPT / OpenAI API

• GPT-4o-mini for 80% of tasks — it's shockingly good for the price
• Use structured outputs (JSON mode) to reduce output tokens
• Batch API: 50% off for async processing
• Avoid GPT-4.5 unless genuinely needed — it's 30x more expensive than 4o

🚀 CrewAI / LangChain

• Assign cheaper models to simple agents (research → Haiku, writing → Sonnet)
• Set max_iterations per agent to prevent runaway loops
• Cache tool results — don't re-search the same query twice
• Use LangSmith/Arize to identify which agents burn the most tokens

⚡ n8n / Make / Zapier

• Use AI nodes sparingly — each one is an API call
• Combine multiple prompts into one node where possible
• Cache results in a database instead of re-querying
• Set usage alerts — Zapier/Make can spiral costs if workflows run too often

💻 Cursor / Windsurf / Cline

• Use "fast" model for autocomplete, "smart" model only for complex edits
• Be specific in prompts — vague prompts = more back-and-forth = more tokens
• Use @file references instead of pasting entire files into chat
• Cursor Pro ($20/mo) vs API credits — calculate which is cheaper for your usage

The Monthly Budget Framework

Example: SaaS Operator Running 3 Daily Automations

Trading plan (Sonnet, daily)~$1.50/mo

Content research (Sonnet, daily)~$2.00/mo

Idea validation (Opus, weekly)~$3.00/mo

Heartbeats & misc (Haiku, ongoing)~$0.50/mo

Interactive chat (Sonnet, ~20 msgs/day)~$5.00/mo

Total~$12/mo

💡 The 80/20 Rule of AI Costs

80% of your costs come from 20% of your tasks. Find those expensive tasks (usually: long conversations with big context, or unnecessarily using Opus/o3 for simple work). Fix those, and your bill drops dramatically. Most operators should be under $15/month.