Website · Docs · Paper · Ask Tracy · Quickstart
ClawTrace: Cost-Aware Tracing for LLM Agent Skill Distillation — Boqin Yuan, Renchu Song, Yue Su, Sen Yang, Jing Qin · arXiv 2604.23853
Skill-distillation pipelines learn reusable rules from LLM agent trajectories, but they lack a key signal — how much each step costs. ClawTrace records every LLM call, tool use, and sub-agent spawn during a session and compiles it into a TraceCard: a ~1.5 kB YAML summary with per-step USD cost, token counts, and redundancy flags. On top of TraceCards, CostCraft produces three patch types — preserve, prune (with counterfactual evidence), and repair — that improve agent skills without inflating cost.
Capture → Compile → Distill. ClawTrace instruments the agent (Substrate), compiles each session into a TraceCard (IR), and merges TraceCards into evolved skills via a preserve / prune / repair typology (Methodology).
📄 Read the paper: https://arxiv.org/abs/2604.23853 · BibTeX
My OpenClaw agent burned ~40× its normal token budget in under an hour. Root cause: it was appending ~1,500 messages of history to every LLM call. By the time I noticed, it had already spent a few dollars on what should have been a 3-cent task — and I couldn't see it from logs, because OpenClaw flattens everything into a wall of JSON. The loop was invisible.
ClawTrace was built after that incident, and the paper above is what came out of using it at scale.
ClawTrace records every agent run as a tree of spans and lets you inspect it.
openclaw plugins install @epsilla/clawtrace
openclaw clawtrace setup
openclaw gateway restartThen open clawtrace.ai. Your next run appears automatically.
- Token usage per step — see exactly which LLM call ate your budget
- Tool calls and retries — spot loops before they compound
- Execution timeline — Gantt chart of every span, parallel and sequential
- Full input/output — click any step to see what went in and what came back
You can also ask questions in plain English. Tracy is an AI analyst wired directly to your trajectory graph. She runs live Cypher queries against your data, generates charts, and tells you specifically what to fix.
"Why did my last run cost so much?" "Which tool is failing most often?" "Is my context window growing across sessions?"
Every trajectory has three views — click any node/span/bar to open step detail with full payloads, token counts, duration, cost, and errors.
Execution path — collapsible tree, parent-child relationships, per-node cost badges
Call graph — force-directed diagram of every agent, model, and tool in the run
Timeline — Gantt chart showing where time actually went
openclaw plugins install @epsilla/clawtraceopenclaw clawtrace setupPaste your observe key from clawtrace.ai when prompted. 200 free credits, no credit card.
openclaw gateway restartDone. Every run now streams to ClawTrace automatically.
The plugin also exposes a /v1/evolve/ask endpoint so your agent can query Tracy about its own trajectories. Install the ClawTrace Self-Evolve skill and your agent will periodically check its own cost and failure patterns, apply fixes, and log what it changed.
openclaw skills install clawtrace-self-evolvegraph TB
subgraph Agent Runtime
OC[OpenClaw Agent]
PLG["@epsilla/clawtrace plugin<br/>8 hook types"]
end
subgraph Ingest Layer
ING[Ingest Service<br/>FastAPI + Cloud Storage]
end
subgraph Data Lake
RAW[Raw JSON Events<br/>Azure Blob / GCS / S3]
DBX[Databricks Lakeflow<br/>SQL Pipeline]
ICE[Iceberg Silver Tables<br/>events_all, pg_traces,<br/>pg_spans, pg_agents]
end
subgraph Graph Layer
PG[PuppyGraph<br/>Cypher over Delta Lake]
end
subgraph Backend Services
API[Backend API<br/>FastAPI + asyncpg]
PAY[Payment Service<br/>Credits + Stripe]
MCP[Tracy MCP Server<br/>Cypher queries]
end
subgraph AI Layer
TRACY[Tracy Agent<br/>Anthropic Managed Harness<br/>Claude Sonnet 4.6]
end
subgraph Frontend
UI[ClawTrace UI<br/>Next.js 15 + React 19]
DOCS[Documentation<br/>Server-rendered Markdown]
end
subgraph External
NEON[(Neon PostgreSQL<br/>Users, API Keys,<br/>Credits, Sessions)]
STRIPE[Stripe<br/>Payments]
end
OC --> PLG
PLG -->|"POST /v1/traces/events"| ING
ING --> RAW
RAW --> DBX
DBX --> ICE
ICE --> PG
PG -->|Cypher| API
PG -->|Cypher| MCP
API --> NEON
PAY --> NEON
PAY --> STRIPE
MCP -->|tool results| TRACY
TRACY -->|SSE stream| API
UI -->|REST API| API
UI -->|SSE| API
API -->|deficit check| PAY
- Capture — The plugin intercepts 8 OpenClaw hook types:
session_start,session_end,llm_input,llm_output,before_tool_call,after_tool_call,subagent_spawning,subagent_ended - Ingest — Events are batched and POSTed to the ingest service, which writes partitioned JSON to cloud storage (
tenant={id}/agent={id}/dt=YYYY-MM-DD/hr=HH/) - Transform — Databricks Lakeflow SQL pipeline materializes raw events into 8 Iceberg silver tables every 3 minutes
- Query — PuppyGraph virtualizes the Delta Lake tables as a Cypher-queryable graph (Tenant → Agent → Trace → Span with CHILD_OF edges)
- Serve — Backend API runs Cypher queries; Tracy's MCP server gives the AI analyst direct graph access
- Display — Next.js UI renders trace trees, call graphs, timelines, and Tracy's streamed responses with inline ECharts
4 vertex types (Tenant, Agent, Trace, Span), 4 edge types (HAS_AGENT, OWNS, HAS_SPAN, CHILD_OF). Agent execution data is naturally a graph; ClawTrace models it that way so Tracy can traverse it with Cypher instead of joining flat tables.
clawtrace/
├── packages/clawtrace-ui/ Next.js 15 frontend (App Router, React 19, Drizzle ORM)
├── services/clawtrace-backend/ FastAPI backend (PuppyGraph, JWT auth, Tracy chat)
├── services/clawtrace-ingest/ FastAPI ingest (multi-tenant, cloud-agnostic storage)
├── services/clawtrace-payment/ FastAPI billing (consumption credits, Stripe, notifications)
├── plugins/clawtrace/ @epsilla/clawtrace npm plugin for OpenClaw
├── sql/databricks/ Lakeflow SQL pipeline (silver tables + billing tables)
└── puppygraph/ PuppyGraph schema configuration
| Layer | Technology |
|---|---|
| Frontend | Next.js 15, React 19, CSS Modules, ECharts, react-markdown |
| Backend | FastAPI, asyncpg, httpx, Pydantic Settings |
| Database | Neon PostgreSQL (users, credits, sessions), Drizzle ORM |
| Data Lake | Azure Blob Storage, Databricks, Delta Lake, Iceberg |
| Graph | PuppyGraph (Cypher over Delta Lake) |
| AI | Anthropic Managed Agents (Claude Sonnet 4.6), MCP protocol |
| Billing | Stripe, consumption-based credits |
| Deployment | Vercel (UI), Docker + Kubernetes (services) |
Cost estimates cover 80+ models with cache-aware pricing (fresh input, cached input, cache write, output calculated separately):
Western: OpenAI (GPT-5.x, GPT-4.x, o-series), Anthropic (Claude Opus/Sonnet/Haiku), Google (Gemini 3.x/2.x/1.5), DeepSeek (V3, R1), Mistral
Chinese: Alibaba Qwen (3.x Max/Plus/Flash), Zhipu GLM, Moonshot Kimi, Baidu ERNIE, MiniMax
Open source: Llama 4/3.x, Mixtral, Stepfun
- Rubric-based evaluation — define quality rubrics, auto-score trajectories, catch regressions before deployment
- A/B testing — run agent variants side by side, compare cost/quality/speed, promote winners
- Version control — track agent config changes, roll back, audit
- Self-evolving agents — agents that learn from their own trajectory data to cut costs and fix failure patterns automatically
cd packages/clawtrace-ui
npm install
npm run dev # localhost:3000
npm run typecheckcd services/clawtrace-backend
python -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt
cp .env.example .env
uvicorn app.main:app --reload --port 8082cd services/clawtrace-ingest
python -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt
cp .env.example .env
uvicorn app.main:app --reload --port 8080cd plugins/clawtrace
npm install
npm testIf you use ClawTrace, TraceCards, or CostCraft in academic work, please cite:
@article{yuan2026clawtrace,
title = {ClawTrace: Cost-Aware Tracing for LLM Agent Skill Distillation},
author = {Yuan, Boqin and Song, Renchu and Su, Yue and Yang, Sen and Qin, Jing},
journal = {arXiv preprint arXiv:2604.23853},
year = {2026},
url = {https://arxiv.org/abs/2604.23853}
}Inspired by and builds on openclaw-tracing, a reference implementation for tracing OpenClaw executions.
Apache 2.0. See LICENSE for details.






