GitHub - epsilla-cloud/clawtrace: Make your OpenClaw agents better, cheaper, and faster.

Cost-aware tracing & skill distillation for LLM agents

Website · Docs · Paper · Ask Tracy · Quickstart

Paper

ClawTrace: Cost-Aware Tracing for LLM Agent Skill Distillation — Boqin Yuan, Renchu Song, Yue Su, Sen Yang, Jing Qin · arXiv 2604.23853

Skill-distillation pipelines learn reusable rules from LLM agent trajectories, but they lack a key signal — how much each step costs. ClawTrace records every LLM call, tool use, and sub-agent spawn during a session and compiles it into a TraceCard: a ~1.5 kB YAML summary with per-step USD cost, token counts, and redundancy flags. On top of TraceCards, CostCraft produces three patch types — preserve, prune (with counterfactual evidence), and repair — that improve agent skills without inflating cost.

_{Capture → Compile → Distill. ClawTrace instruments the agent (Substrate), compiles each session into a TraceCard (IR), and merges TraceCards into evolved skills via a preserve / prune / repair typology (Methodology).}

📄 Read the paper: https://arxiv.org/abs/2604.23853 · BibTeX

Why this exists

My OpenClaw agent burned ~40× its normal token budget in under an hour. Root cause: it was appending ~1,500 messages of history to every LLM call. By the time I noticed, it had already spent a few dollars on what should have been a 3-cent task — and I couldn't see it from logs, because OpenClaw flattens everything into a wall of JSON. The loop was invisible.

ClawTrace was built after that incident, and the paper above is what came out of using it at scale.

ClawTrace records every agent run as a tree of spans and lets you inspect it.

openclaw plugins install @epsilla/clawtrace
openclaw clawtrace setup
openclaw gateway restart

Then open clawtrace.ai. Your next run appears automatically.

What it shows

Token usage per step — see exactly which LLM call ate your budget
Tool calls and retries — spot loops before they compound
Execution timeline — Gantt chart of every span, parallel and sequential
Full input/output — click any step to see what went in and what came back

Ask Tracy

You can also ask questions in plain English. Tracy is an AI analyst wired directly to your trajectory graph. She runs live Cypher queries against your data, generates charts, and tells you specifically what to fix.

"Why did my last run cost so much?" "Which tool is failing most often?" "Is my context window growing across sessions?"

Three views per trace

Every trajectory has three views — click any node/span/bar to open step detail with full payloads, token counts, duration, cost, and errors.

Execution path — collapsible tree, parent-child relationships, per-node cost badges

Call graph — force-directed diagram of every agent, model, and tool in the run

Timeline — Gantt chart showing where time actually went

Getting started

1. Install the plugin on your OpenClaw agent

openclaw plugins install @epsilla/clawtrace

2. Authenticate

openclaw clawtrace setup

Paste your observe key from clawtrace.ai when prompted. 200 free credits, no credit card.

3. Restart the gateway

openclaw gateway restart

Done. Every run now streams to ClawTrace automatically.

Self-evolving agents

The plugin also exposes a /v1/evolve/ask endpoint so your agent can query Tracy about its own trajectories. Install the ClawTrace Self-Evolve skill and your agent will periodically check its own cost and failure patterns, apply fixes, and log what it changed.

openclaw skills install clawtrace-self-evolve

Architecture

graph TB
    subgraph Agent Runtime
        OC[OpenClaw Agent]
        PLG["@epsilla/clawtrace plugin<br/>8 hook types"]
    end

    subgraph Ingest Layer
        ING[Ingest Service<br/>FastAPI + Cloud Storage]
    end

    subgraph Data Lake
        RAW[Raw JSON Events<br/>Azure Blob / GCS / S3]
        DBX[Databricks Lakeflow<br/>SQL Pipeline]
        ICE[Iceberg Silver Tables<br/>events_all, pg_traces,<br/>pg_spans, pg_agents]
    end

    subgraph Graph Layer
        PG[PuppyGraph<br/>Cypher over Delta Lake]
    end

    subgraph Backend Services
        API[Backend API<br/>FastAPI + asyncpg]
        PAY[Payment Service<br/>Credits + Stripe]
        MCP[Tracy MCP Server<br/>Cypher queries]
    end

    subgraph AI Layer
        TRACY[Tracy Agent<br/>Anthropic Managed Harness<br/>Claude Sonnet 4.6]
    end

    subgraph Frontend
        UI[ClawTrace UI<br/>Next.js 15 + React 19]
        DOCS[Documentation<br/>Server-rendered Markdown]
    end

    subgraph External
        NEON[(Neon PostgreSQL<br/>Users, API Keys,<br/>Credits, Sessions)]
        STRIPE[Stripe<br/>Payments]
    end

    OC --> PLG
    PLG -->|"POST /v1/traces/events"| ING
    ING --> RAW
    RAW --> DBX
    DBX --> ICE
    ICE --> PG

    PG -->|Cypher| API
    PG -->|Cypher| MCP

    API --> NEON
    PAY --> NEON
    PAY --> STRIPE

    MCP -->|tool results| TRACY
    TRACY -->|SSE stream| API

    UI -->|REST API| API
    UI -->|SSE| API
    API -->|deficit check| PAY

Data flow

Capture — The plugin intercepts 8 OpenClaw hook types: session_start, session_end, llm_input, llm_output, before_tool_call, after_tool_call, subagent_spawning, subagent_ended
Ingest — Events are batched and POSTed to the ingest service, which writes partitioned JSON to cloud storage (tenant={id}/agent={id}/dt=YYYY-MM-DD/hr=HH/)
Transform — Databricks Lakeflow SQL pipeline materializes raw events into 8 Iceberg silver tables every 3 minutes
Query — PuppyGraph virtualizes the Delta Lake tables as a Cypher-queryable graph (Tenant → Agent → Trace → Span with CHILD_OF edges)
Serve — Backend API runs Cypher queries; Tracy's MCP server gives the AI analyst direct graph access
Display — Next.js UI renders trace trees, call graphs, timelines, and Tracy's streamed responses with inline ECharts

Graph schema

4 vertex types (Tenant, Agent, Trace, Span), 4 edge types (HAS_AGENT, OWNS, HAS_SPAN, CHILD_OF). Agent execution data is naturally a graph; ClawTrace models it that way so Tracy can traverse it with Cypher instead of joining flat tables.

Monorepo structure

clawtrace/
├── packages/clawtrace-ui/        Next.js 15 frontend (App Router, React 19, Drizzle ORM)
├── services/clawtrace-backend/   FastAPI backend (PuppyGraph, JWT auth, Tracy chat)
├── services/clawtrace-ingest/    FastAPI ingest (multi-tenant, cloud-agnostic storage)
├── services/clawtrace-payment/   FastAPI billing (consumption credits, Stripe, notifications)
├── plugins/clawtrace/            @epsilla/clawtrace npm plugin for OpenClaw
├── sql/databricks/               Lakeflow SQL pipeline (silver tables + billing tables)
└── puppygraph/                   PuppyGraph schema configuration

Tech stack

Layer	Technology
Frontend	Next.js 15, React 19, CSS Modules, ECharts, react-markdown
Backend	FastAPI, asyncpg, httpx, Pydantic Settings
Database	Neon PostgreSQL (users, credits, sessions), Drizzle ORM
Data Lake	Azure Blob Storage, Databricks, Delta Lake, Iceberg
Graph	PuppyGraph (Cypher over Delta Lake)
AI	Anthropic Managed Agents (Claude Sonnet 4.6), MCP protocol
Billing	Stripe, consumption-based credits
Deployment	Vercel (UI), Docker + Kubernetes (services)

Model pricing

Cost estimates cover 80+ models with cache-aware pricing (fresh input, cached input, cache write, output calculated separately):

Western: OpenAI (GPT-5.x, GPT-4.x, o-series), Anthropic (Claude Opus/Sonnet/Haiku), Google (Gemini 3.x/2.x/1.5), DeepSeek (V3, R1), Mistral

Chinese: Alibaba Qwen (3.x Max/Plus/Flash), Zhipu GLM, Moonshot Kimi, Baidu ERNIE, MiniMax

Open source: Llama 4/3.x, Mixtral, Stepfun

Roadmap

Rubric-based evaluation — define quality rubrics, auto-score trajectories, catch regressions before deployment
A/B testing — run agent variants side by side, compare cost/quality/speed, promote winners
Version control — track agent config changes, roll back, audit
Self-evolving agents — agents that learn from their own trajectory data to cut costs and fix failure patterns automatically

Development

Frontend

cd packages/clawtrace-ui
npm install
npm run dev          # localhost:3000
npm run typecheck

Backend

cd services/clawtrace-backend
python -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt
cp .env.example .env
uvicorn app.main:app --reload --port 8082

Ingest

cd services/clawtrace-ingest
python -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt
cp .env.example .env
uvicorn app.main:app --reload --port 8080

Plugin

cd plugins/clawtrace
npm install
npm test

Citation

If you use ClawTrace, TraceCards, or CostCraft in academic work, please cite:

@article{yuan2026clawtrace,
  title   = {ClawTrace: Cost-Aware Tracing for LLM Agent Skill Distillation},
  author  = {Yuan, Boqin and Song, Renchu and Su, Yue and Yang, Sen and Qin, Jing},
  journal = {arXiv preprint arXiv:2604.23853},
  year    = {2026},
  url     = {https://arxiv.org/abs/2604.23853}
}

Inspirations

Inspired by and builds on openclaw-tracing, a reference implementation for tracing OpenClaw executions.

License

Apache 2.0. See LICENSE for details.

Name		Name	Last commit message	Last commit date
Latest commit History 516 Commits
costcraft		costcraft
docs		docs
packages/clawtrace-ui		packages/clawtrace-ui
paper_experiments		paper_experiments
plugins/clawtrace		plugins/clawtrace
puppygraph		puppygraph
services		services
sql/databricks		sql/databricks
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
DESIGN.md		DESIGN.md
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
TODOS.md		TODOS.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Cost-aware tracing & skill distillation for LLM agents

Paper

Why this exists

What it shows

Ask Tracy

Three views per trace

Getting started

1. Install the plugin on your OpenClaw agent

2. Authenticate

3. Restart the gateway

Self-evolving agents

Architecture

Data flow

Graph schema

Monorepo structure

Tech stack

Model pricing

Roadmap

Development

Frontend

Backend

Ingest

Plugin

Citation

Inspirations

License

About

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Cost-aware tracing & skill distillation for LLM agents

Paper

Why this exists

What it shows

Ask Tracy

Three views per trace

Getting started

1. Install the plugin on your OpenClaw agent

2. Authenticate

3. Restart the gateway

Self-evolving agents

Architecture

Data flow

Graph schema

Monorepo structure

Tech stack

Model pricing

Roadmap

Development

Frontend

Backend

Ingest

Plugin

Citation

Inspirations

License

About

Topics

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Contributors

Uh oh!

Languages