A modular, pluggable Retrieval-Augmented Generation (RAG) framework with hybrid search, full pipeline observability, and a Streamlit developer studio.
Not sure what RAG is? RAG (Retrieval-Augmented Generation) lets you ask questions about your own documents. You upload files (PDFs, Word docs, web pages), and the system finds the most relevant passages and uses an LLM to write a grounded answer with citations.
- What You'll Need
- Installation
- Configuration
- Your First Run — Streamlit Studio
- Step-by-Step: Ingest → Query → Evaluate
- CLI Usage
- Docker
- MCP Server (Claude Integration)
- External Connectors
- Evaluation Results
- Project Structure
- Running Tests
- Troubleshooting
| Requirement | Version | Notes |
|---|---|---|
| Python | 3.11 or 3.12 | Download |
| OpenAI API key | — | Get one here — used for embeddings and answer generation |
| Git | Any | To clone this repo |
| Docker (optional) | 24+ | Only needed for the Docker setup path |
You do not need a GPU. Everything runs on CPU.
# 1. Clone the repo
git clone <repo-url>
cd rag-framework
# 2. Create a virtual environment
python -m venv .venv
# 3. Activate it
# On macOS / Linux:
source .venv/bin/activate
# On Windows:
.venv\Scripts\activate
# 4. Install the framework
pip install -e ".[dev]"
# 5. Install optional extras (needed for RAGAS answer quality evaluation)
pip install ragas langchain langchain-openai
# 6. Install optional document parsers (Word / Excel support)
pip install python-docx openpyxlSee Section 7 — Docker.
cp .env.example .envOpen .env and fill in your values:
# Required
OPENAI_API_KEY=sk-... # Your OpenAI API key
# Optional — only needed if you use Voyage reranking
VOYAGE_API_KEY=pa-...
# Optional — storage location (defaults to ./data)
RAG_DATA_DIR=./data
⚠️ Never commit your.envfile. It is already in.gitignore.
All pipeline settings live in configs/settings.yaml. You can edit them directly or create a profile override. The main settings you might want to change:
| Setting | Default | What it controls |
|---|---|---|
embedding.provider |
openai |
Which embedding model to use (openai or multilingual) |
retrieval.bm25_top_k |
20 |
How many BM25 candidates to retrieve |
generation.llm_model |
gpt-4o-mini |
Which OpenAI model generates answers |
ingestion.chunk_max_tokens |
400 |
Max tokens per chunk |
reranking.enabled |
true |
Whether to rerank results |
The easiest way to use the framework is through the browser UI:
# Make sure your .env is set up, then:
streamlit run rag/app/studio/studio.pyOpen http://localhost:8501 in your browser. You'll see 4 pages in the sidebar:
| Page | What it does |
|---|---|
| Ingest & Inspect | Upload documents and browse the chunks created from them |
| Query Traces | Ask questions and see the full retrieval pipeline step-by-step |
| Evaluation Panel | Run retrieval quality benchmarks and RAGAS answer quality scoring |
| Connector Sync | Connect Email, Slack, Notion, or Google Docs as document sources |
- Go to Ingest & Inspect in the sidebar
- Click Upload Files and select a PDF, Word doc, or text file
- Choose your embedding provider (
openairecommended) - Click Ingest — you'll see chunk count, token stats, and a success message
You can also ingest from a URL (GitHub, web pages) using the From URL tab.
- Go to Query Traces
- Type your question in the query box
- Select the same embedding provider you used during ingestion
- Click Run Query
- You'll see:
- The generated answer with citations
- Which chunks were retrieved (BM25 vs vector vs hybrid)
- Reranking scores
- Latency and token usage
- Go to Evaluation Panel
- Select an evaluation suite:
- Example Queries — quick regression test (note: circular ground truth)
- Resume Gold Eval — 30 human-labeled queries, non-circular
- Click ▶ Run Evaluation
- See Recall@K, MRR, nDCG, and per-query results
- Still on Evaluation Panel, click the Answer Quality (RAGAS) tab
- Set embedding provider and LLM model
- Click ▶ Run RAGAS Evaluation
- See faithfulness, answer relevancy, and context precision scores
You can also use the framework from the terminal without the UI:
# Ingest a single file
python -m rag.cli.ingest --path /path/to/document.pdf
# Ingest into a named collection (keeps corpora separate)
python -m rag.cli.ingest --path /path/to/resume.pdf --collection resumes
# Ask a question
python -m rag.cli.query --question "What is retrieval-augmented generation?"
# Run an evaluation suite
python -m rag.cli.eval --suite resume_qrels
# Run RAGAS answer quality evaluation
python -m rag.cli.eval --answer-qualityDocker lets you run the full system without installing Python dependencies manually.
Install Docker Desktop.
# 1. Set up your environment file
cp .env.example .env
# Edit .env with your API keys
# 2. Build and start
docker-compose up --buildThis starts two services:
- Studio UI → http://localhost:8501
- MCP Server → http://localhost:8000
Your data (database + indexes) is stored in a Docker volume so it persists across restarts.
docker-compose downdocker-compose up --buildThe MCP server exposes the RAG pipeline as tools that Claude (or any MCP-compatible client) can call.
| Tool | What it does |
|---|---|
rag.ingest |
Ingest a file or URL into the knowledge base |
rag.query |
Ask a question and get a grounded answer with citations |
retrieve |
Get raw ranked chunks for a query (no LLM generation) |
retrieve_with_metadata |
Same as retrieve, but returns source file, page, and chunk ID |
list_collections |
List all document collections and their sizes |
rag.eval.run |
Run an evaluation suite |
rag.sync_source |
Sync documents from an external connector |
python -m rag.app.mcp_server.serverThe server runs on http://localhost:8000.
When your corpus has multiple document types, use the collection parameter to restrict retrieval to a specific group:
# Only search resumes, not the knowledge base
retrieve(query="What is Rita's GPA?", collection="resumes")The framework can automatically pull documents from external sources. Configure credentials in .env and use the Connector Sync page in the Studio.
| Connector | Required env vars |
|---|---|
| Email (IMAP) | RAG_EMAIL_SERVER, RAG_EMAIL_USER, RAG_EMAIL_PASSWORD |
| Slack | RAG_SLACK_TOKEN, RAG_SLACK_CHANNEL_IDS |
| Notion | RAG_NOTION_TOKEN, RAG_NOTION_DATABASE_IDS |
| Google Docs | RAG_GOOGLE_CREDENTIALS_PATH, RAG_GOOGLE_FOLDER_IDS |
| Suite | Method | Recall@10 | MRR | nDCG@10 |
|---|---|---|---|---|
| Resume Gold Eval (BM25 only) | keyword | 0.923 | 0.615 | 0.688 |
| Resume Gold Eval (hybrid + collection scope) | BM25 + FAISS | 0.964 | 0.622 | 0.708 |
| BEIR SciFact | BM25 + FAISS | — | — | 0.703 |
BEIR SciFact nDCG@10 = 0.703 beats the published ColBERT baseline (0.671).
rag-framework/
├── rag/
│ ├── core/
│ │ ├── contracts/ # Data models: Document, Chunk, Answer, Citation
│ │ ├── interfaces/ # Abstract base classes for all pluggable components
│ │ ├── registry/ # Component factories
│ │ └── utils/ # Hashing, batching, token counting
│ ├── pipelines/ # IngestPipeline, QueryPipeline, EvalPipeline
│ ├── infra/
│ │ ├── stores/ # SQLite document + trace store
│ │ ├── parsing/ # PDF, HTML, Markdown, Word, Excel parsers
│ │ ├── chunking/ # Paragraph splitter, chunk packer
│ │ ├── embedding/ # OpenAI, multilingual (local) providers
│ │ ├── indexes/ # FAISS vector index, BM25 keyword index
│ │ ├── rerank/ # Voyage, cross-encoder rerankers
│ │ ├── llm/ # OpenAI LLM client
│ │ ├── connectors/ # Email, Slack, Notion, Google Docs, Web
│ │ └── evaluation/ # RAGAS answer quality evaluator
│ └── app/
│ ├── studio/ # Streamlit UI (4 pages + components)
│ ├── mcp_server/ # MCP tool implementations + schemas
│ └── cli/ # CLI entry points
├── configs/ # YAML settings, prompt templates, profiles
├── tests/ # 657 tests (unit + integration + e2e)
├── scripts/ # BEIR evaluation runner
├── Dockerfile
├── docker-compose.yml
├── .env.example
└── pyproject.toml
# Run all unit and integration tests (fast, no API calls)
pytest tests/ -m "not e2e" -q
# Run everything including end-to-end tests
pytest tests/ -q
# Run a specific test file
pytest tests/test_bm25_local.py -vOPENAI_API_KEY not found
Make sure you've created a .env file from .env.example and filled in your key. If running via terminal, also check that your shell loaded the file (source .env or restart your terminal).
KMP_DUPLICATE_LIB_OK error on macOS
This is an OpenMP conflict between FAISS and PyTorch. Fix it by adding this to your .env:
KMP_DUPLICATE_LIB_OK=TRUE
Or prefix your command: KMP_DUPLICATE_LIB_OK=TRUE streamlit run ...
RAGAS evaluation fails with "RAGAS is not installed"
pip install ragas langchain langchain-openaiThen restart Streamlit.
Chunks not found after re-ingesting documents
Chunk IDs are content-based hashes. If you change the parser or chunking settings and re-ingest, new chunk IDs are generated. You'll need to regenerate any evaluation fixtures that reference specific chunk IDs (like resume_qrels.json).
Streamlit shows a blank page
Try a hard refresh (Cmd+Shift+R on macOS, Ctrl+Shift+R on Windows/Linux). If it persists, restart Streamlit.
This project was inspired in part by jerry-ai-dev/MODULAR-RAG-MCP-SERVER.