Spider

Website | Guides | Spider Cloud | API Docs | Examples | Discord

⚡ Fast, reliable, low-latency web crawling for Rust 🕸️

Fast — 100k+ pages in 1–10 minutes, benchmarked against the real internet.
Low-latency — pages stream the moment they're fetched. No batching, no waiting.
Reliable — battle-tested concurrency, automatic retries, proxy hedging, and graceful shutdown.
Scales with you — one script to a fleet of workers, same API.

A taste

[dependencies]
spider = "2"

use spider::{tokio, website::Website};

#[tokio::main]
async fn main() {
    let mut website = Website::new("https://example.com");
    let mut rx = website.subscribe(16);

    tokio::spawn(async move {
        while let Ok(page) = rx.recv().await {
            println!("{}  {}", page.status_code, page.get_url());
        }
    });

    website.crawl().await;
    website.unsubscribe();
}

That's the whole program. Pages stream as they arrive. The crawler stops when there's nothing left to fetch.

Want JavaScript rendering? Add features = ["chrome"] and call crawl_smart() instead — Spider will use HTTP first and only spin up Chrome for pages that need it.

What you can build

Scrapers that turn live websites into Markdown, plain text, JSON, or WARC archives.
Pipelines that ingest pages into search indexes, vector stores, or databases as they're fetched — no temp directories, no full-crawl waits.
AI agents that browse, click, fill forms, and reason about pages using OpenAI, Gemini, or any OpenAI-compatible API.
Monitors that re-crawl on a cron schedule and emit only what changed.
Headless browser automation at scale — locally with Chrome or WebDriver, or remotely via Spider Cloud's managed browser pool.

Things Spider does well

Streams in real time. Subscribe once, get every page as it lands. Spider uses Tokio broadcast channels and event-driven wakeups under the hood — your consumer is never starved and the crawler never blocks on a slow downstream.

Handles JavaScript when it has to. "Smart mode" starts with plain HTTP and transparently upgrades to headless Chrome the moment it detects a page needs it. You don't pay the Chrome tax on pages that don't.

Survives the modern web. Real-browser fingerprint emulation, header spoofing, proxy rotation, request hedging across proxies, and stealth Chrome — built in. For sites that fight back hard, Spider Cloud's Smart mode auto-detects Cloudflare, Akamai, Imperva, and friends and switches to its unblocker without a config change.

Respects the rules when you want it to. Robots.txt, sitemaps, per-path budgets, depth limits, allow/deny lists, glob and regex filters, cron schedules — all one method call away.

Stays out of your way. The whole crawler is one fluent builder. Defaults are good. There's nothing you have to configure to get started.

Scales when you need it. Adaptive concurrency, per-domain rate limiting, HTTP/2 multiplexing, io_uring on Linux, request coalescing, and an optional decentralized mode that splits work across worker processes over IPC.

Configure as much (or as little) as you want

use spider::website::Website;
use std::{collections::HashMap, time::Duration};

let mut website = Website::new("https://example.com")
    .with_limit(50)                        // concurrent requests
    .with_depth(10)                        // how deep to follow links
    .with_delay(500)                       // polite pause between hits (ms)
    .with_request_timeout(Some(Duration::from_secs(30)))
    .with_respect_robots_txt(true)
    .with_subdomains(true)
    .with_user_agent(Some("MyBot/1.0"))
    .with_blacklist_url(Some(vec!["/admin".into()]))
    .with_whitelist_url(Some(vec!["/blog".into()]))
    .with_proxies(Some(vec!["http://proxy:8080".into()]))
    .with_budget(Some(HashMap::from([("/blog", 100), ("*", 1000)])))
    .with_caching(true)
    .with_stealth(true)
    .build()
    .unwrap();

Every option is documented in the Configuration API reference.

Spider Cloud

If you don't want to run proxies, manage residential IPs, keep a Chrome pool warm, or chase Cloudflare update cycles — point Spider at Spider Cloud and the same code runs against a managed crawling backend. Free tier on signup.

use spider::configuration::{SpiderCloudConfig, SpiderCloudMode};

let cloud = SpiderCloudConfig::new("sk-...")
    .with_mode(SpiderCloudMode::Smart);

let mut website = Website::new("https://example.com")
    .with_spider_cloud_config(cloud)
    .build()?;

Five cloud modes — Proxy, Api, Unblocker, Fallback, Smart — let you trade cost for resilience without changing your crawler code.

Install

Use Spider as…	Command
A Rust library	`cargo add spider`
A command-line tool	`cargo install spider_cli`
A Node.js package	`npm i @spider-rs/spider-rs`
A Python package	`pip install spider_rs`
An MCP server (for Claude, Cursor, etc.)	`cargo install spider_mcp`
Managed crawling	Sign up at spider.cloud

What's in the box

The workspace ships nine crates. Most users only need spider itself.

Crate	What it's for
`spider`	The crawler. Start here.
`spider_cli`	A standalone command-line crawler.
`spider_worker`	A worker process for decentralized crawls.
`spider_agent`	An autonomous AI browsing agent (Chrome / WebDriver + LLMs).
`spider_mcp`	A Model Context Protocol server so AI tools can call Spider directly.
`spider_utils`, `spider_agent_types`, `spider_agent_html`	Supporting libraries.

There are 50+ runnable examples in examples/ — Chrome automation, screenshots, OpenAI extraction, WARC export, cron scheduling, decentralized crawling, sitemap-only mode, anti-bot setups, and more.

Documentation

API reference — https://docs.rs/spider
Guides & recipes — https://spider.cloud/guides
Examples — ./examples/
Development notes — CLAUDE.md

Community

💬 Discord — questions, ideas, show-and-tell
🐛 GitHub Issues — bug reports and feature requests
🗞️ Spider blog — release notes and deep dives

Contributing

Spider is open source and we love good pull requests. See CONTRIBUTING.md. For tests:

cargo test -p spider                  # unit tests
RUN_LIVE_TESTS=1 cargo test           # live network tests

License

MIT — use it for anything, commercial or otherwise.

Name		Name	Last commit message	Last commit date
Latest commit History 2,199 Commits
.cargo		.cargo
.github		.github
benches		benches
examples		examples
spider		spider
spider_agent		spider_agent
spider_agent_html		spider_agent_html
spider_agent_types		spider_agent_types
spider_cli		spider_cli
spider_mcp		spider_mcp
spider_utils		spider_utils
spider_worker		spider_worker
.editorconfig		.editorconfig
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CLAUDE.md		CLAUDE.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
default.nix		default.nix
release.sh		release.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Spider

Website | Guides | Spider Cloud | API Docs | Examples | Discord

A taste

What you can build

Things Spider does well

Configure as much (or as little) as you want

Spider Cloud

Install

What's in the box

Documentation

Community

Contributing

License

About

Uh oh!

Releases 166

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Spider

Website | Guides | Spider Cloud | API Docs | Examples | Discord

A taste

What you can build

Things Spider does well

Configure as much (or as little) as you want

Spider Cloud

Install

What's in the box

Documentation

Community

Contributing

License

About

Topics

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 166

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages