Documentation · Install · Quick Start · MCP Tools · Pricing
The problems are architectural, not instructional. You cannot prompt your way out of a stateless architecture.
Session starts ──► Reconstruct context from logs ──► 10,000–30,000 tokens burned
before a single line of work runs
Cron job fires ──► Agent has no memory of last run ──► Repeats completed work
Loops. Bills stack up.
Add more guardrails ──► Longer prompts ──► More tokens
──► More complexity ──► More failure surface
──► More maintenance ──► Less time saved
The control paradox: the more control you try to add through prompts, the more expensive and fragile the system becomes. You end up spending more time fixing the automation than the automation saves.
┌─────────────────────────────────────────────────────────────┐
│ │
│ DXT drag-and-drop install · 44 tools registered │
│ automatically · no JSON editing |
│ │
│ MCP stateless on-demand tool invocation │
│ no persistent process between runs │
│ agent wakes, works, terminates cleanly │
│ │
│ Skill ~150 tokens of scoped context injected │
│ Files only when relevant · unloaded when done │
│ 90% less context overhead per session │
│ │
│ VEKTOR ◄─ persistent memory graph · BM25 + vector RRF │
│ recall · self-organising intelligence layer │
│ state that actually survives between sessions │
│ │
└─────────────────────────────────────────────────────────────┘
| Layer | Solves | Token impact |
|---|---|---|
| DXT | Setup friction, misconfigured tools | Surfaces only relevant tools per task |
| MCP | Persistent process requirement, cold starts | Stateless invocation on demand |
| Skill Files | Monster prompts, competing instructions | 150 tokens vs 8,000–20,000 |
| VEKTOR | Session amnesia, broken cron jobs, control paradox | 250–4,000 tokens regardless of DB size |
npm install -g vektor-slipstream
npx vektor setupOr drag vektor-slipstream.dxt directly into Claude Desktop. All 44 tools register automatically. No JSON editing. No path configuration.
const { createMemory } = require('vektor-slipstream');
const memory = await createMemory({
agentId: 'my-agent',
licenceKey: process.env.VEKTOR_LICENCE_KEY,
});
// Store a memory
await memory.remember('User prefers TypeScript. Deployed to prod on Friday.');
// Recall by semantic similarity -- sub-1ms, fully local
const results = await memory.recall('deployment preferences', 5);
// → [{ content, score, id, timestamp }]
// Traverse the associative memory graph
const graph = await memory.graph('TypeScript', { hops: 2 });
// What changed in the last 7 days?
const delta = await memory.delta('project decisions', 7);
// Morning briefing from recent memories
const brief = await memory.briefing();| Without VEKTOR | With VEKTOR | |
|---|---|---|
| Context cost per session | 15,000–50,000 tokens reconstructing history | 250–4,000 tokens for full semantic recall |
| Cron jobs | Agent repeats completed work -- no memory of last run | Recalls previous run outcome in one call |
| Configuration memory | Forgotten every session | Graph surfaces what worked last time automatically |
| Autonomy vs control | Either full autonomy (dangerous) or manual gates (slow) | Agent learns from outcome history when to proceed vs escalate |
| Between-session state | Persistent process required or state is lost | SQLite persists -- stateless invocation, stateful recall |
| Embedding cost | Cloud API call on every store and recall | $0 -- fully local ONNX, no API key required |
Task triggered (cron / webhook / user action)
│
▼
Skill File injected based on task context ~150 tokens
│
▼
vektor_recall_rrf called ~800 tokens
Top-10 semantically relevant memories returned
│
▼
Agent classifies situation from memory history
│
┌────┴─────────────────────┐
▼ ▼
familiar pattern novel / previously failed
proceed autonomously surface for human review
│ │
└────────────┬─────────────┘
▼
Execute task via MCP tools
│
▼
Result stored via vektor_store
Memory graph updated with outcome
│
▼
Session ends · SQLite persists everything
│
▼
Next invocation: same startup cost · full outcome history available
Total context overhead for a routine task: under 2,000 tokens. The same task with a monolithic system prompt and history reconstruction: 15,000–50,000 tokens, with no retention of outcome.
| Metric | Value |
|---|---|
| Recall latency | sub-1ms (local SQLite + ONNX) |
| Embedding cost | $0 -- fully local ONNX |
| Embedding latency | ~10ms GPU / ~25ms CPU |
| LoCoMo benchmark | 66.9% adjusted judge accuracy |
| Min tokens for full recall | 250 |
| Max tokens regardless of DB size | 4,000 |
| First run | ~2 min (downloads ~25MB model once) |
| Subsequent boots | <100ms |
**LoCoMo benchmark results
| Category | Accuracy |
|---|---|
| Multi-hop | 79.1% |
| Adversarial | 70.4% |
| Single-hop | 51.6% |
| Temporal | 46.2% |
| Adjusted total | 66.9% |
npx vektor chat # auto-detects Ollama
npx vektor chat --provider claude # Anthropic Claude
npx vektor chat --provider groq --model llama-3.3-70b-versatile
npx vektor chat --provider openai
npx vektor chat --provider gemini| Provider | Details |
|---|---|
ollama |
Default -- free, local, no API key. Auto-detects best model. |
claude |
Anthropic Claude -- set ANTHROPIC_API_KEY |
openai |
OpenAI GPT -- set OPENAI_API_KEY |
groq |
Groq LLaMA -- set GROQ_API_KEY (free tier available) |
gemini |
Google Gemini -- set GEMINI_API_KEY |
In-chat commands:
| Command | Action |
|---|---|
/recall <query> |
Search memory mid-conversation |
/stats |
Node count, edges, pinned memories |
/briefing |
Generate memory briefing inline |
/exit |
Exit (Ctrl+C also works) |
One-liner commands:
# Store facts
npx vektor remember "I prefer TypeScript over JavaScript"
npx vektor remember "deadline is Friday" --importance 5
cat meeting-notes.txt | npx vektor remember
# Query
npx vektor ask "what stack am I using?"
npx vektor ask "what did we decide about the database?"
# Autonomous agent
npx vektor agent "summarise everything I know about project Alpha"
npx vektor agent "research AI memory tools" --steps 15 --provider groqInstall the .dxt extension for zero-config persistent memory in every Claude Desktop session.
Install: drag vektor-slipstream.dxt onto the Claude Desktop Extensions page.
Once installed, Claude automatically:
- Recalls relevant context at session start
- Stores facts and decisions during conversation
- Summarises and consolidates at session end
All 44 tools available. No configuration beyond your licence key.
Download: vektormemory.com/docs/dxt
| Tool | Function |
|---|---|
vektor_recall |
Semantic + BM25 + graph search across memory |
vektor_recall_rrf |
BM25+RRF dual-channel recall with cross-encoder rerank |
vektor_store |
Store memory with importance score |
vektor_ingest |
Batch ingest conversation turns with session date |
vektor_graph |
Traverse associative memory graph |
vektor_delta |
See what changed on a topic over time |
vektor_briefing |
Generate briefing from recent memories |
vektor_stats |
Memory DB stats -- node count, edges, entities |
vektor_timeline |
Query memories by date range |
| Tool | Function |
|---|---|
cloak_fetch |
Stealth headless browser fetch via Playwright |
cloak_fetch_smart |
Checks llms.txt first, falls back to stealth browser |
cloak_render |
Full CSS/DOM layout sensor |
cloak_diff |
Semantic diff of URL since last fetch |
cloak_diff_text |
Structural diff between two text blobs |
cloak_passport |
AES-256-GCM credential vault (get/set/delete/list) |
cloak_ssh_exec |
Execute commands on remote server via SSH |
cloak_ssh_upload |
Upload file to remote server via SFTP |
cloak_cortex |
Scan project directory into memory graph |
cloak_cortex_anatomy |
Get cached file anatomy without rescanning |
tokens_saved |
Token efficiency ROI calculator |
| Tool | Function |
|---|---|
cloak_identity_create |
Create persistent browser fingerprint identity |
cloak_identity_use |
Apply saved identity to a fetch call |
cloak_identity_list |
List saved identities with trust summary |
cloak_inject_behaviour |
Human mouse/scroll injection for reCAPTCHA/Cloudflare |
cloak_behaviour_stats |
List available patterns and categories |
cloak_load_pattern |
Load custom recorded behaviour pattern |
cloak_pattern_stats |
Self-improving pattern store tier breakdown |
cloak_pattern_list |
List patterns with scores and tier |
cloak_pattern_prune |
Remove stale/low-scoring patterns |
cloak_pattern_seed |
Seed store with built-in patterns |
| Tool | Function |
|---|---|
cloak_detect_captcha |
Detect CAPTCHA type and sitekey |
cloak_solve_captcha |
Solve via vision AI (Claude/GPT-4o/2captcha) |
| Tool | Function |
|---|---|
turbo_quant_compress |
PolarQuant vector compression (~75% smaller) |
turbo_quant_stats |
Compression ratio and savings stats |
| Tool | Function |
|---|---|
vektor_text |
Text generation (OpenAI/Claude/Groq/Gemini/NVIDIA NIM) |
vektor_image |
Image generation (DALL-E, Stability, NVIDIA) |
vektor_vision |
Image understanding and analysis |
vektor_speech |
Text-to-speech and transcription |
vektor_search |
Web search with memory integration |
vektor_providers |
List available providers and status |
| Tool | Function |
|---|---|
vektor_agent_run |
Autonomous goal executor with memory |
vektor_swarm |
Multi-agent swarm task |
vektor_watch |
File system watcher -- auto-ingest on change |
npx vektor setup # First-run wizard -- licence, hardware, integrations
npx vektor activate # Activate licence key on this machine
npx vektor test # Test memory engine with progress bar
npx vektor status # System health check
npx vektor mcp # Start Claude Desktop MCP server
npx vektor rem # Run REM dream cycle (memory consolidation)
npx vektor chat # Persistent memory chat (all LLMs)
npx vektor remember # Store a fact
npx vektor ask # Query memory + LLM answer
npx vektor agent # Autonomous goal executor
npx vektor help # All commandsAdd to .claude/settings.json in your project:
{
"mcpServers": {
"vektor": {
"command": "node",
"args": ["/path/to/node_modules/vektor-slipstream/index.js"],
"env": {
"VEKTOR_LICENCE_KEY": "your-licence-key",
"CLOAK_PROJECT_PATH": "/path/to/your/project"
}
}
}
}- 4-layer associative graph -- semantic, causal, temporal, entity edges
- bge-small-en-v1.5 bi-encoder + ms-marco cross-encoder reranker
- BM25 + stemmed BM25 + RRF fusion -- keyword + semantic dual-channel recall
- Persistent entity index -- guaranteed named-entity retrieval
- Foresight extraction -- future-tense statements stored with temporal metadata
- ADD-only contradiction detection -- full history preserved, no silent overwrites
- REM dream cycle -- up to 50:1 memory compression
- Sub-1ms recall -- local SQLite, no network required
- Local ONNX embeddings -- $0 embedding cost, no API key required
| Module | Function |
|---|---|
recall-tune |
Adjusts retrieval weights based on which memories produced correct outcomes |
confidence |
Scores memories by reliability across corroborating sources |
dedup |
Removes semantic duplicates, keeps the graph clean |
selforg |
Reorganises memory clusters as new information accumulates |
rl-memory |
Reinforcement signals surface higher-quality memories preferentially |
briefing-scheduler |
Periodic summaries of memory activity |
- Claude Desktop -- DXT extension, 44 tools, auto-memory on every session
- Claude Code -- MCP server, all 44 tools
- CLI --
chat,remember,ask,agentcommands - LangChain -- v1 + v2 adapter included
- OpenAI Agents SDK -- drop-in integration
- Groq · Gemini · Ollama · NVIDIA NIM -- provider agnostic
Zero config. VEKTOR detects and uses the best available accelerator:
- NVIDIA CUDA -- GPU acceleration
- Apple Silicon -- CoreML
- CPU -- optimised fallback, works everywhere
| Variable | Default | Purpose |
|---|---|---|
VEKTOR_SUMMARIZE |
false |
Enable LLM session summarisation on ingest |
VEKTOR_TRIPLES |
true |
Enable batch triple extraction on ingest |
VEKTOR_FORESIGHT |
true |
Extract future-tense foresight signals |
VEKTOR_TEMPORAL |
true |
Enable temporal index and date boosting |
VEKTOR_CONTRADICT |
true |
Enable ADD-only contradiction detection |
VEKTOR_DEBUG |
-- | Enable verbose retrieval debug output |
VEKTOR_MODEL |
Xenova/bge-small-en-v1.5 |
Swap embedding model |
VEKTOR_RERANK |
true |
Enable cross-encoder reranking |
Built on peer-reviewed research:
- MAGMA (arxiv:2601.03236) -- Multi-Graph Agentic Memory Architecture
- EverMemOS (arxiv:2601.02163) -- Self-Organizing Memory OS
- HippoRAG (arxiv:2405.14831) -- Neurobiologically Inspired Long-Term Memory (NeurIPS 2024)
- Mem0 (arxiv:2504.19413) -- Production-Ready Agent Memory
- LoCoMo Benchmark (arxiv:2402.17753) -- Long-Context Conversational Memory
| Plan | Price | Licences |
|---|---|---|
| Solo | $9/mo | 3 |
| Team | $35/mo | 5 |
| Studio | $59/mo | 10 |
| Enterprise | $99/mo | 25 |
Retrieval pipeline rebuilt from scratch:
- bge-small-en-v1.5 bi-encoder + ms-marco cross-encoder reranker (spec-decode architecture)
- BM25 + Porter-stemmed BM25 + named entity injection, fused via RRF
- MAGMA graph layer -- co-occurrence and temporal edges between entities in SQLite
- Persistent entity index (
vektor_entities) for guaranteed named-entity recall - Foresight extraction -- future-tense statements stored for temporal queries
- Question type classifier -- routes single-hop vs multi-hop to optimal retrieval path
- ADD-only contradiction detection -- conflicting facts survive with timestamps
- Agentic sufficiency check -- reformulates query if key entities missing from top results
vektormemory.com · Docs · hello@vektormemory.com
Stop prompting like it's 2024. Build agents that remember.