Document RAG Analysis

A multi-agent RAG system that extracts information from documents, generates summaries, and fact-checks the output against the source material. Built with the Deep Agents framework.

Architecture

document-rag-analysis/
├── agents/
│   ├── extractor.py      — analyze_document tool (index, retrieve, deduplicate, summarize)
│   ├── fact_checker.py   — verify_claims tool (cross-checks summary against source chunks)
│   └── orchestrator.py   — Deep Agent wiring
├── main.py               — CLI entry point
├── requirements.txt
└── .env.example

Agent	Role	Model
Orchestrator	Calls `analyze_document` directly, then delegates to `fact-checker` subagent	`claude-sonnet-4-6`
`fact-checker` subagent	Verifies each claim in the summary against retrieved source passages	`claude-haiku-4-5-20251001`

Pipeline

Document path
    │
    ▼
analyze_document (extractor.py)
    ├── Index once → Chroma (cached across runs)
    ├── 2 broad similarity searches
    ├── Deduplicate chunks (Jaccard)
    └── Summarize in one LLM call
    │
    ▼
verify_claims (fact_checker.py)
    ├── Re-retrieve same chunks from Chroma
    └── Per-claim verdict: ✅ Supported / ⚠️ Partially / ❌ Not Found
    │
    ▼
Summary + Fact-check report

OpenAI is used only for embeddings (text-embedding-3-small). All agent reasoning uses Claude. Vectorstore operations are excluded from LangSmith traces to stay under the 20 MB payload limit; LLM calls are fully traced.

Setup

pip install -r requirements.txt
cp .env.example .env   # fill in ANTHROPIC_API_KEY + OPENAI_API_KEY

Usage

# Executive summary + fact-check (default)
python main.py ./docs/report.pdf

# Bullet summary with a custom extraction query
python main.py ./docs/ --query "What are the risk factors?" --summary-type bullet

# Detailed summary of a specific file
python main.py ./docs/paper.pdf --summary-type detailed

Options

Flag	Default	Description
`document_path`	required	Path to a PDF, `.txt` file, or directory of documents
`--query`	`"Extract all key information..."`	What to extract from the documents
`--summary-type`	`executive`	`executive`, `detailed`, or `bullet`
`--thread-id`	auto-generated	Session ID for conversation continuity

Test Documents

A download script is included to fetch a few public arXiv papers into ./docs/:

python download_test_docs.py

File	Paper
`attention_is_all_you_need.pdf`	Attention Is All You Need (Transformer)
`bert.pdf`	BERT: Pre-training of Deep Bidirectional Transformers
`gpt3.pdf`	Language Models are Few-Shot Learners (GPT-3)
`llama.pdf`	LLaMA: Open and Efficient Foundation Language Models

# Summarize a single paper
python main.py ./docs/attention_is_all_you_need.pdf --summary-type detailed

# Query across all four papers at once
python main.py ./docs/ --query "What architecture or training techniques are proposed?" --summary-type bullet

Environment Variables

Copy .env.example to .env and fill in your keys. The .env file is gitignored and will never be committed.

ANTHROPIC_API_KEY=your-anthropic-key
OPENAI_API_KEY=your-openai-key

# LangSmith tracing (optional)
LANGSMITH_TRACING=true
LANGSMITH_API_KEY=your-langsmith-key
LANGSMITH_PROJECT=document-rag-analysis

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.cursor/rules		.cursor/rules
agents		agents
.env.example		.env.example
.gitignore		.gitignore
download_test_docs.py		download_test_docs.py
main.py		main.py
readme.md		readme.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Document RAG Analysis

Architecture

Pipeline

Setup

Usage

Options

Test Documents

Environment Variables

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Document RAG Analysis

Architecture

Pipeline

Setup

Usage

Options

Test Documents

Environment Variables

About

Resources

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages