A multi-agent RAG system that extracts information from documents, generates summaries, and fact-checks the output against the source material. Built with the Deep Agents framework.
document-rag-analysis/
├── agents/
│ ├── extractor.py — analyze_document tool (index, retrieve, deduplicate, summarize)
│ ├── fact_checker.py — verify_claims tool (cross-checks summary against source chunks)
│ └── orchestrator.py — Deep Agent wiring
├── main.py — CLI entry point
├── requirements.txt
└── .env.example
| Agent | Role | Model |
|---|---|---|
| Orchestrator | Calls analyze_document directly, then delegates to fact-checker subagent |
claude-sonnet-4-6 |
fact-checker subagent |
Verifies each claim in the summary against retrieved source passages | claude-haiku-4-5-20251001 |
Document path
│
▼
analyze_document (extractor.py)
├── Index once → Chroma (cached across runs)
├── 2 broad similarity searches
├── Deduplicate chunks (Jaccard)
└── Summarize in one LLM call
│
▼
verify_claims (fact_checker.py)
├── Re-retrieve same chunks from Chroma
└── Per-claim verdict: ✅ Supported / ⚠️ Partially / ❌ Not Found
│
▼
Summary + Fact-check report
OpenAI is used only for embeddings (text-embedding-3-small). All agent reasoning uses Claude. Vectorstore operations are excluded from LangSmith traces to stay under the 20 MB payload limit; LLM calls are fully traced.
pip install -r requirements.txt
cp .env.example .env # fill in ANTHROPIC_API_KEY + OPENAI_API_KEY# Executive summary + fact-check (default)
python main.py ./docs/report.pdf
# Bullet summary with a custom extraction query
python main.py ./docs/ --query "What are the risk factors?" --summary-type bullet
# Detailed summary of a specific file
python main.py ./docs/paper.pdf --summary-type detailed| Flag | Default | Description |
|---|---|---|
document_path |
required | Path to a PDF, .txt file, or directory of documents |
--query |
"Extract all key information..." |
What to extract from the documents |
--summary-type |
executive |
executive, detailed, or bullet |
--thread-id |
auto-generated | Session ID for conversation continuity |
A download script is included to fetch a few public arXiv papers into ./docs/:
python download_test_docs.py| File | Paper |
|---|---|
attention_is_all_you_need.pdf |
Attention Is All You Need (Transformer) |
bert.pdf |
BERT: Pre-training of Deep Bidirectional Transformers |
gpt3.pdf |
Language Models are Few-Shot Learners (GPT-3) |
llama.pdf |
LLaMA: Open and Efficient Foundation Language Models |
# Summarize a single paper
python main.py ./docs/attention_is_all_you_need.pdf --summary-type detailed
# Query across all four papers at once
python main.py ./docs/ --query "What architecture or training techniques are proposed?" --summary-type bulletCopy .env.example to .env and fill in your keys. The .env file is gitignored and will never be committed.
ANTHROPIC_API_KEY=your-anthropic-key
OPENAI_API_KEY=your-openai-key
# LangSmith tracing (optional)
LANGSMITH_TRACING=true
LANGSMITH_API_KEY=your-langsmith-key
LANGSMITH_PROJECT=document-rag-analysis