Debugging & Observability
Your agent did something weird at 3 AM. Here's how to figure out why.
Your agent ran a cron job at 3 AM. Something went wrong. The output was weird. How do you figure out what happened? This chapter gives you the debugging and observability toolkit.
The 3 Pillars of Agent Observability
Raw record of every action. Input → thinking → output → tool calls → results. The foundation of debugging.
Connected chain of events: trigger → model call → tool use → response → delivery. Shows cause and effect.
Token usage over time, error rates, response latency, cost per task. Tells you if things are getting better or worse.
Common Agent Failures & How to Debug Them
Debug steps:
- 1. Check the input prompt — was context missing?
- 2. Check which model was used — cheaper models hallucinate more
- 3. Check if knowledge base files were accessible
- 4. Check context window — was it full/truncated?
- 5. Fix: Add missing context to knowledge base, or upgrade model for that task
- 1. Check cron expression — is the timezone correct?
- 2. Check if the gateway was running at scheduled time
- 3. Check API key validity — expired keys fail silently
- 4. Check rate limits — were you throttled?
- 5. Fix: Add a heartbeat check that monitors cron execution
- 1. Check for prompt injection in input data
- 2. Check if system prompt was too vague or contradictory
- 3. Check conversation history — did it drift over many messages?
- 4. Check if it hit a tool error and improvised badly
- 5. Fix: Tighten system prompt, add guardrails, use isolated sessions for risky tasks
- 1. Check for infinite loops (agent retrying failed tool calls)
- 2. Check context window size — bloated history = expensive
- 3. Check if a cron job ran more often than expected
- 4. Check if model was accidentally set to Opus/o3 for routine tasks
- 5. Fix: Set max iterations, compact context, fix model routing
🔌 Observability by Platform
# View session history openclaw sessions list --active 60 # Check cron job runs openclaw cron runs --job "Trading Plan" --limit 5 # View session logs openclaw sessions history --session <key> --include-tools # Check gateway status openclaw status # Monitor in real-time # Add a daily self-diagnostic cron: openclaw cron add --name "Daily Health Check" \ --cron "0 22 * * *" --session isolated \ --message "Run a self-diagnostic: 1. Check all cron jobs ran today (list runs) 2. Check for any errors in recent sessions 3. Report: tasks completed, tasks failed, total cost Post summary to Discord." --model "haiku" --announce
import anthropic
import json
from datetime import datetime
client = anthropic.Anthropic()
def logged_completion(prompt, model="claude-sonnet-4-20250514"):
"""Wrapper that logs every API call"""
start = datetime.now()
response = client.messages.create(
model=model,
max_tokens=1024,
messages=[{"role": "user", "content": prompt}]
)
log_entry = {
"timestamp": start.isoformat(),
"model": model,
"input_tokens": response.usage.input_tokens,
"output_tokens": response.usage.output_tokens,
"latency_ms": (datetime.now() - start).total_seconds() * 1000,
"prompt_preview": prompt[:100],
"stop_reason": response.stop_reason,
}
with open("logs/agent.jsonl", "a") as f:
f.write(json.dumps(log_entry) + "\n")
return response- • LangSmith: Automatic tracing for LangChain. See every chain step, token count, latency
- • Arize Phoenix: Open-source observability. Local dashboard for traces + evals
- • CrewAI verbose=True: Prints every agent step to console — basic but effective
- • OpenTelemetry: Industry standard. Export traces to any observability platform
- • n8n: Built-in execution history. Click any run to see input/output for every node
- • Make: Scenario history with full data flow visualization
- • Zapier: Task history with per-step data. Set up error notifications
- • Pro tip: Add a "log to spreadsheet" node at the end of every workflow for your own analytics
- • Cursor: Check Settings → Usage to monitor token consumption
- • Git diff: Review what the agent changed with
git diffbefore committing - • Undo: Use git to revert bad changes:
git checkout -- . - • Cline: Shows full conversation log in the sidebar — review reasoning
The "Morning After" Checklist
Quick daily review checklist: 1. Did all scheduled cron jobs run? ✅/❌ 2. Any error messages in logs? ✅/❌ 3. Token usage within budget? ✅/❌ 4. Output quality acceptable? ✅/❌ 5. Any unexpected behaviors? ✅/❌ If all ✅ → Great, move on. If any ❌ → Debug using the failure patterns above.
The "Time Travel" Debug Technique
The most powerful debugging technique for agents: reproduce the exact conditions. When something goes wrong at 3 AM, you need to see what the agent saw.
# Step 1: Find the failing run openclaw cron runs --job "Trading Plan" --limit 1 # Step 2: Check what files existed at that time git log --oneline --until="2026-02-22T06:00:00" -5 # Step 3: Check what the agent's memory looked like git show HEAD~2:memory/2026-02-21.md # Step 4: Re-run with the same context openclaw cron run --job "Trading Plan" --force # This re-runs the exact same prompt in a fresh session # Step 5: Compare outputs # Old output (from logs) vs new output (from re-run) # If they match → the input was the problem # If they differ → the model was non-deterministic (use temperature 0)
Building Your Dashboard
After a month of running agents, you'll want a dashboard. Here's the minimal viable monitoring setup:
openclaw cron add \ --name "Agent Health Dashboard" \ --cron "0 22 * * *" \ --session isolated \ --message "End-of-day health check: 1. List all cron jobs and their last run status 2. Count total sessions today 3. Estimate today's API cost 4. Check for any errors in logs 5. Compare today's output quality to baseline Format: 📊 **Agent Health — [Date]** - Cron jobs: [X/Y ran successfully] - Sessions: [N total] - Est. cost: $[amount] - Errors: [count] [brief description if any] - Quality: [✅ Good / ⚠️ Check needed / ❌ Issues found] If all green, keep it to 5 lines max." \ --model "haiku" --announce \ --channel discord --to "channel:YOUR_ID"
The Debugging Flowchart
When your agent does something weird, follow this exact sequence:
Observable Agent Architecture
The best debugging setup is one where you can see everything without adding debug code. Build observability in from day one:
- 📝 Input logging — save every prompt sent to the model (with timestamps)
- 📝 Output logging — save every response received
- 📝 Tool call logging — which tools were called, with what args, and what they returned
- 📝 Decision logging — why the agent chose action A over action B
- 📝 Cost logging — tokens used per request (catches runaway costs early)
The "Replay" Technique
The most powerful debugging technique: replay the exact same input and see if you get the same output. If you do, the bug is deterministic (probably a prompt issue). If you don't, the bug is stochastic (probably a temperature or context window issue).
This is why input logging matters so much. Without the exact input, you can't replay. And without replay, you're guessing.
Share this chapter
Chapter navigation
19 of 36