AI-powered browser automation CLI. Write in plain English, AI generates a validated agent-browser command plan, and the runner executes it step by step.
Before using
agent-surf, please install agent-browser first
as "open github.com, search for agent-surf, take a screenshot" --yesβ agent-surf
β Generating plan [claude]
β Plan ready (312 tokens)
ββ Plan (5 steps) ββββββββββββββββββββββββββββββββββββββββββ
β 1. open https://github.com
β Navigate to GitHub
β 2. wait --load networkidle
β Wait for page to fully load
β 3. snapshot -i
β Get interactive elements and @refs [read]
β 4. find placeholder "Search" fill "agent-surf"
β Fill the search input
β 5. screenshot result.png
β Capture the result
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Proceed? βΊ yes
βΆ Executing...
β Done β all 5 steps completed.
npm install -g @ekaone/agent-surfpnpm install -g @ekaone/agent-surfRequires
agent-browserto be installed:npm install -g agent-browser agent-browser install # download browser binaries
# Claude (default)
export ANTHROPIC_API_KEY=your_key_here
# OpenAI
export OPENAI_API_KEY=your_key_here
# Browser Use cloud execution (optional)
export BROWSER_USE_API_KEY=your_key_hereWindows PowerShell:
$env:ANTHROPIC_API_KEY="your_key_here"
agent-surf "open example.com and take a screenshot"
as "open example.com and take a screenshot"Chain multiple browser actions in plain English using "then", "and", "after that":
as "go to github.com, find the search box, type json-cli, press enter, screenshot the results"as "open localhost:3000, wait for the page to load, scroll down, take a full page screenshot"as "open example.com, check if the login button is visible, click it, fill email and password, submit"as "open my app at localhost:3000, login with admin@test.com and password123, navigate to settings, take a screenshot"β agent-surf
β Plan ready (489 tokens)
ββ Plan (8 steps) ββββββββββββββββββββββββββββββββββββββββββ
β 1. open http://localhost:3000
β Navigate to local app
β 2. wait --load networkidle
β Wait for page to load
β 3. snapshot -i
β Discover interactive elements [read]
β 4. fill @e1 "admin@test.com"
β Fill email field
β 5. fill @e2 "password123"
β Fill password field
β 6. click @e3
β Click login button
β 7. wait --load networkidle
β Wait for dashboard to load
β 8. screenshot dashboard.png
β Capture dashboard
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Proceed? βΊ yes
# Scrape page content
as "open news.ycombinator.com, get the text of the first post title"
# Form interaction
as "open example.com/form, fill name with 'John', select country 'Indonesia', check the terms checkbox, submit"
# Scroll and capture
as "open example.com, scroll down 1000px, wait 2000, take a full page screenshot"
# Tab management
as "open github.com, click the first repo link in a new tab"
# Debug a page
as "open example.com, get the page title, check if the nav is visible, take an annotated screenshot"
# PDF export
as "open example.com/report, wait for load, save as report.pdf"agent-surf "<goal>" [options]
as "<goal>" [options]Options
--provider <n> AI provider: claude | openai | ollama (default: claude)
-p, --browser-provider agent-browser provider: browseruse | browserbase | browserless
--session <n> agent-browser session name
--headed Show browser window (not headless)
--yes, -y Skip confirmation prompt
--dry-run Show plan without executing
--debug Show system prompt and raw AI response
--help, -h Show this help message
--version, -v Show version
User Goal (plain English)
β
βΌ
AI Provider β Claude / OpenAI / Ollama
β extracts ALL intents, sequences them
βΌ
JSON Plan β validated by Zod schema (max 20 steps)
β
βΌ
Catalog Check β whitelist prevents hallucinated commands
β
βΌ
Confirm (y/n) β review the full plan before execution
β
βΌ
Runner β segment-aware execution
β chain steps β joined with && (efficient)
β read steps β run solo, output captured
βΌ
agent-browser β spawned per segment, streams output live
The runner is segment-aware β it groups steps intelligently:
- Chain steps (
open,click,fill,screenshot, ...) are joined with&&in a single shell invocation. Theagent-browserdaemon persists across the chain, making this fast and efficient. - Read steps (
snapshot,get text,is visible, ...) run solo so their output can be captured. Asnapshotstep discovers@refhandles (e.g.@e1,@e2) used by subsequent interaction steps.
open github.com && wait --load networkidle β single chain invocation
snapshot -i β solo (captures @refs)
fill @e1 "json-cli" && press Enter && screenshot result.png β chain again
# Claude (default)
as "open example.com and screenshot"
# OpenAI
as "open example.com and screenshot" --provider openai
# Ollama (local, no API key needed)
as "open example.com and screenshot" --provider ollamaagent-browser supports cloud and local browser execution:
# Local Chrome (default, no extra key needed)
as "open example.com"
# Browser Use cloud
as "open example.com" -p browseruse
# Browserbase cloud
as "open example.com" -p browserbase
# Browserless cloud
as "open example.com" -p browserlessANTHROPIC_API_KEY=sk-ant-... # Claude (default AI provider)
OPENAI_API_KEY=sk-... # OpenAI
BROWSER_USE_API_KEY=... # Browser Use cloud executionagent-surf covers the full agent-browser command surface via a typed catalog:
| Group | Commands |
|---|---|
| Navigation | open, close, back, forward, reload |
| Interaction | click, dblclick, fill, type, press, hover, focus, select, check, uncheck, scroll, scrollintoview, drag, upload |
| Keyboard | keyboard type, keyboard inserttext, keydown, keyup |
| Capture | screenshot, pdf, snapshot |
| Wait | wait (element, ms, --text, --url, --load, --fn, --state) |
| Get Info | get text, get html, get value, get attr, get title, get url, get count, get box, get styles, get cdp-url |
| Check State | is visible, is enabled, is checked |
| Streaming | stream enable, stream status, stream disable |
| CDP | connect, eval |
Add custom commands that are automatically included in AI planning and validation:
import { extendCatalog } from "@ekaone/agent-surf";
extendCatalog({
"my custom command": {
description: "Does something custom",
args: {
target: { type: "string", required: true, description: "Target selector" },
},
flags: {
"--option": { type: "boolean", required: false, description: "An option" },
},
executionKind: "chain",
},
});Use agent-surf as a library in your own tools:
import { generatePlan, runPlan, createProvider } from "@ekaone/agent-surf";
const provider = createProvider("claude");
const { plan } = await generatePlan(
"open example.com and take a screenshot",
provider
);
const result = await runPlan(plan, {
headed: true,
session: "my-session",
});
console.log(result.success); // truepnpm install
pnpm dev "open example.com and screenshot"
pnpm test
pnpm buildMIT Β© Eka Prasetia
β If this helps you, please consider giving it a star on GitHub!