@ekaone/agent-surf

AI-powered browser automation CLI. Write in plain English, AI generates a validated agent-browser command plan, and the runner executes it step by step.

Before using agent-surf, please install agent-browser first

Quick Start

as "open github.com, search for agent-surf, take a screenshot" --yes

◆ agent-surf
◒ Generating plan  [claude]
✔ Plan ready  (312 tokens)

┌─ Plan (5 steps) ──────────────────────────────────────────
│ 1. open https://github.com
│    Navigate to GitHub
│ 2. wait  --load networkidle
│    Wait for page to fully load
│ 3. snapshot  -i
│    Get interactive elements and @refs  [read]
│ 4. find placeholder "Search" fill "agent-surf"
│    Fill the search input
│ 5. screenshot result.png
│    Capture the result
└───────────────────────────────────────────────────────────

✔ Proceed? › yes

▶ Executing...
✔ Done — all 5 steps completed.

Installation

npm install -g @ekaone/agent-surf

pnpm install -g @ekaone/agent-surf

Requires agent-browser to be installed:

npm install -g agent-browser
agent-browser install   # download browser binaries

Setup

# Claude (default)
export ANTHROPIC_API_KEY=your_key_here

# OpenAI
export OPENAI_API_KEY=your_key_here

# Browser Use cloud execution (optional)
export BROWSER_USE_API_KEY=your_key_here

Windows PowerShell: $env:ANTHROPIC_API_KEY="your_key_here"

Usage

Single goal

agent-surf "open example.com and take a screenshot"
as "open example.com and take a screenshot"

Multi-step goals

Chain multiple browser actions in plain English using "then", "and", "after that":

as "go to github.com, find the search box, type json-cli, press enter, screenshot the results"

as "open localhost:3000, wait for the page to load, scroll down, take a full page screenshot"

as "open example.com, check if the login button is visible, click it, fill email and password, submit"

Full automation flows 🚀

as "open my app at localhost:3000, login with admin@test.com and password123, navigate to settings, take a screenshot"

◆ agent-surf
✔ Plan ready  (489 tokens)

┌─ Plan (8 steps) ──────────────────────────────────────────
│ 1. open http://localhost:3000
│    Navigate to local app
│ 2. wait  --load networkidle
│    Wait for page to load
│ 3. snapshot  -i
│    Discover interactive elements  [read]
│ 4. fill @e1 "admin@test.com"
│    Fill email field
│ 5. fill @e2 "password123"
│    Fill password field
│ 6. click @e3
│    Click login button
│ 7. wait  --load networkidle
│    Wait for dashboard to load
│ 8. screenshot dashboard.png
│    Capture dashboard
└───────────────────────────────────────────────────────────

✔ Proceed? › yes

More examples

# Scrape page content
as "open news.ycombinator.com, get the text of the first post title"

# Form interaction
as "open example.com/form, fill name with 'John', select country 'Indonesia', check the terms checkbox, submit"

# Scroll and capture
as "open example.com, scroll down 1000px, wait 2000, take a full page screenshot"

# Tab management
as "open github.com, click the first repo link in a new tab"

# Debug a page
as "open example.com, get the page title, check if the nav is visible, take an annotated screenshot"

# PDF export
as "open example.com/report, wait for load, save as report.pdf"

Options

agent-surf "<goal>" [options]
as "<goal>" [options]

Options
  --provider <n>            AI provider: claude | openai | ollama  (default: claude)
  -p, --browser-provider    agent-browser provider: browseruse | browserbase | browserless
  --session <n>             agent-browser session name
  --headed                  Show browser window (not headless)
  --yes, -y                 Skip confirmation prompt
  --dry-run                 Show plan without executing
  --debug                   Show system prompt and raw AI response
  --help, -h                Show this help message
  --version, -v             Show version

How it works

User Goal (plain English)
    │
    ▼
AI Provider              ← Claude / OpenAI / Ollama
    │                      extracts ALL intents, sequences them
    ▼
JSON Plan                ← validated by Zod schema (max 20 steps)
    │
    ▼
Catalog Check            ← whitelist prevents hallucinated commands
    │
    ▼
Confirm (y/n)            ← review the full plan before execution
    │
    ▼
Runner                   ← segment-aware execution
    │                      chain steps  → joined with &&  (efficient)
    │                      read steps   → run solo, output captured
    ▼
agent-browser            ← spawned per segment, streams output live

Chain vs Read steps

The runner is segment-aware — it groups steps intelligently:

Chain steps (open, click, fill, screenshot, ...) are joined with && in a single shell invocation. The agent-browser daemon persists across the chain, making this fast and efficient.
Read steps (snapshot, get text, is visible, ...) run solo so their output can be captured. A snapshot step discovers @ref handles (e.g. @e1, @e2) used by subsequent interaction steps.

open github.com && wait --load networkidle   ← single chain invocation
snapshot -i                                  ← solo (captures @refs)
fill @e1 "json-cli" && press Enter && screenshot result.png  ← chain again

AI Providers

# Claude (default)
as "open example.com and screenshot"

# OpenAI
as "open example.com and screenshot" --provider openai

# Ollama (local, no API key needed)
as "open example.com and screenshot" --provider ollama

Browser Providers

agent-browser supports cloud and local browser execution:

# Local Chrome (default, no extra key needed)
as "open example.com"

# Browser Use cloud
as "open example.com" -p browseruse

# Browserbase cloud
as "open example.com" -p browserbase

# Browserless cloud
as "open example.com" -p browserless

Environment variables

ANTHROPIC_API_KEY=sk-ant-...    # Claude (default AI provider)
OPENAI_API_KEY=sk-...           # OpenAI
BROWSER_USE_API_KEY=...         # Browser Use cloud execution

Supported commands

agent-surf covers the full agent-browser command surface via a typed catalog:

Group	Commands
Navigation	`open`, `close`, `back`, `forward`, `reload`
Interaction	`click`, `dblclick`, `fill`, `type`, `press`, `hover`, `focus`, `select`, `check`, `uncheck`, `scroll`, `scrollintoview`, `drag`, `upload`
Keyboard	`keyboard type`, `keyboard inserttext`, `keydown`, `keyup`
Capture	`screenshot`, `pdf`, `snapshot`
Wait	`wait` (element, ms, `--text`, `--url`, `--load`, `--fn`, `--state`)
Get Info	`get text`, `get html`, `get value`, `get attr`, `get title`, `get url`, `get count`, `get box`, `get styles`, `get cdp-url`
Check State	`is visible`, `is enabled`, `is checked`
Streaming	`stream enable`, `stream status`, `stream disable`
CDP	`connect`, `eval`

Extending the catalog

Add custom commands that are automatically included in AI planning and validation:

import { extendCatalog } from "@ekaone/agent-surf";

extendCatalog({
  "my custom command": {
    description: "Does something custom",
    args: {
      target: { type: "string", required: true, description: "Target selector" },
    },
    flags: {
      "--option": { type: "boolean", required: false, description: "An option" },
    },
    executionKind: "chain",
  },
});

Programmatic API

Use agent-surf as a library in your own tools:

import { generatePlan, runPlan, createProvider } from "@ekaone/agent-surf";

const provider = createProvider("claude");

const { plan } = await generatePlan(
  "open example.com and take a screenshot",
  provider
);

const result = await runPlan(plan, {
  headed: true,
  session: "my-session",
});

console.log(result.success); // true

Local development

pnpm install
pnpm dev "open example.com and screenshot"
pnpm test
pnpm build

License

Links

⭐ If this helps you, please consider giving it a star on GitHub!

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
.github/workflows		.github/workflows
src		src
tests		tests
.gitignore		.gitignore
.npmignore		.npmignore
LICENSE		LICENSE
README.md		README.md
package.json		package.json
pnpm-lock.yaml		pnpm-lock.yaml
tsconfig.json		tsconfig.json
tsup.config.ts		tsup.config.ts
vitest.config.ts		vitest.config.ts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

@ekaone/agent-surf

Quick Start

Installation

Setup

Usage

Single goal

Multi-step goals

Full automation flows 🚀

More examples

Options

How it works

Chain vs Read steps

AI Providers

Browser Providers

Environment variables

Supported commands

Extending the catalog

Programmatic API

Local development

License

Links

About

Uh oh!

Releases 2

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

@ekaone/agent-surf

Quick Start

Installation

Setup

Usage

Single goal

Multi-step goals

Full automation flows 🚀

More examples

Options

How it works

Chain vs Read steps

AI Providers

Browser Providers

Environment variables

Supported commands

Extending the catalog

Programmatic API

Local development

License

Links

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages