feat(cli/extraction): pluggable LLM provider interface — Anthropic + gpt-5-nano default#463
Open
zsculac wants to merge 31 commits into
Open
feat(cli/extraction): pluggable LLM provider interface — Anthropic + gpt-5-nano default#463zsculac wants to merge 31 commits into
zsculac wants to merge 31 commits into
Conversation
… provider Extracts the LLM extraction logic from llm-extractor.ts into a pluggable LlmProvider interface (llm-provider.ts), with the OpenAI implementation (providers/openai.ts) preserving byte-equivalent request/response shape. The tolerant N-Triples parser is lifted verbatim into its own module (parse-ntriples.ts) so future providers (Anthropic in issue 0002) can share it. Fail-soft semantics now warn with a [openai] prefix on: - missing apiKey - non-2xx HTTP response - AbortError / timeout - malformed JSON body LlmConfig gains an optional provider?: 'openai' field (widened to include 'anthropic' in issue 0002). The single caller at daemon/routes/memory.ts is untouched. Test: fail-soft missing-apiKey emits [openai]-prefixed warn.
Snapshots URL, method, headers, and JSON body fields against the chat-completions contract. Asserts model=gpt-4o-mini, temperature=0.1, max_tokens=4096, and the two-message [system,user] array. The system prompt is checked for load-bearing phrases rather than full-text snapshot so the test stays readable for reviewers.
Adds OpenAI fail-soft scenarios (HTTP 500, AbortError, malformed JSON) asserting empty result, default model returned, and [openai]-prefixed warning. Adds a sanity test for parseNTriples imported from its own module covering URI objects, literal objects with language tags, and tolerance for markdown fences and comments — the regression-proof seam both providers will share in issue 0002.
Wires up Claude Sonnet 4.6 alongside the existing OpenAI provider for
Layer 2 semantic extraction. Provider precedence is:
DKG_EXTRACTION_PROVIDER env var > LlmConfig.provider > 'openai'.
Unknown values fall back to OpenAI with a console warning.
- New providers/anthropic.ts mirrors providers/openai.ts:
POST ${baseURL}/v1/messages, x-api-key + anthropic-version: 2023-06-01,
body { model, max_tokens, system, messages:[{role:'user',content}] }
with NO temperature field. Response read from data.content[0].text;
tokens = input_tokens + output_tokens. Same 60s timeout and
60_000-char markdown truncation as OpenAI.
- llm-extractor.ts dispatches via resolveProvider() and wraps invoke()
in an outer try/catch as a final fail-soft safety net.
- LlmConfig.provider widened to 'openai' | 'anthropic'.
- New test: Anthropic happy path verifies parsed triples + token sum.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Snapshot-equivalent test (literal JSON.equal, not toMatchSnapshot) that locks the wire-level contract of AnthropicProvider: - URL: https://api.anthropic.com/v1/messages - x-api-key + anthropic-version: 2023-06-01 + Content-Type headers - body { model, max_tokens, system, messages:[{role:'user',content}] } - explicit assertion that no temperature field is sent - tokens summed from input_tokens + output_tokens Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…nfig Pins the precedence rule (env > config > default). Stubs env to 'anthropic' with config.provider='openai' and verifies the request hits api.anthropic.com, not api.openai.com. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…h warn Stubs DKG_EXTRACTION_PROVIDER to a bogus value and asserts: - the call still completes (routed to OpenAI) - a console.warn names the unknown value and announces the fallback Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Mirrors the existing OpenAI fail-soft matrix: - missing apiKey - HTTP 500 - AbortError (timeout) - malformed JSON body Each asserts an empty result with model='claude-sonnet-4-6' and a console.warn prefixed [anthropic]. The HTTP 500 case also checks the status code is surfaced in the warning; AbortError checks for 'timeout' or 'abort' in the message. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Feeds identical 6-triple raw N-Triples text through both providers' response shapes (data.choices[0].message.content vs data.content[0].text) and asserts the resulting .triples arrays are deep-equal. Locks the contract that the shared parseNTriples() output is provider-agnostic. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…nstant The OpenAI snapshot and Anthropic shape tests previously asserted the system message via `body.messages[0].content` and `body.system` respectively — both sides of the toEqual would have become `undefined` if the provider stopped sending the prompt, and the test would have still passed. Import the shared `DOCUMENT_KG_PROMPT` constant from the provider module and use it as the expected value so a regression that drops or corrupts the system prompt now fails both assertions. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…actionInput The provider-facing invocation type and the public extractor input were defined as two structurally identical interfaces in separate modules, which the reviewer flagged as duplication. Make `LlmProviderInvocation` a type alias of `LlmExtractionInput`, the canonical public type defined in `llm-extractor.ts`. Both providers (`openai.ts`, `anthropic.ts`) continue to import `LlmProviderInvocation` by name — the alias keeps their imports stable and preserves the internal naming convention without duplicating fields. The import added to `llm-provider.ts` is `import type` only; combined with the existing type-only import of `LlmExtractionOutput`, the cycle between the two modules remains erased at runtime. No behavioural change. `llm-extractor.ts` is bit-identical (public exports untouched). All 15 extraction-llm tests pass; `tsc --noEmit` reports 0 errors.
…actionInput The LlmProviderInvocation alias was reduced to a one-liner equal to LlmExtractionInput. The reviewer flagged this as meaningless indirection. Remove the alias and its docstring; the LlmProvider interface and both providers now reference LlmExtractionInput directly. Pure refactor: zero behavioural change. The 15 existing tests in extraction-llm.test.ts pass unchanged. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Add a structural assertion that both providers expose a `defaultModel` property on the LlmProvider interface. This pins the model-name strings to the provider modules where they live (instead of having the dispatcher duplicate them as literals in its outer-catch fallback). Currently RED: `openaiProvider.defaultModel` and `anthropicProvider .defaultModel` are both undefined. The follow-up refactor adds the property to the interface and to each provider object, and rewires `llm-extractor.ts`'s outer-catch to read `provider.defaultModel` instead of hard-coded `'gpt-4o-mini'` / `'claude-sonnet-4-6'`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…Model The dispatcher's outer-catch fallback in `llm-extractor.ts` previously hard-coded `'gpt-4o-mini'` and `'claude-sonnet-4-6'` as literal strings, duplicating the `DEFAULT_MODEL` constants already defined in each provider module. Renaming a model would have been a multi-file change. Surface each provider's default model on the `LlmProvider` interface as a readonly `defaultModel: string` property. Populate it in both provider objects from the existing `DEFAULT_MODEL` constants, so the source of truth stays inside each provider module. Rewire the outer- catch to `llmConfig.model ?? provider.defaultModel`, collapsing the provider-name conditional into a single property read. All 16 extraction-llm tests pass (15 existing + 1 new structural assertion from the preceding commit). `tsc --noEmit` reports 0 errors. The literal model strings no longer appear in `llm-extractor.ts`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Fold the LLM extraction subsystem write-up from the ephemeral .ai/architecture.md into the canonical root ARCHITECTURE.md as a new 'LLM Extraction Pipeline' section under the DKG V10 Architecture parent, sibling to 'Component Model'. The new section covers: - One-paragraph intro framing Layer 1 (structural, deterministic) and Layer 2 (semantic, LLM-assisted) and the sole caller in POST /api/memory/turn. - A classDiagram in the style of the existing top-level diagrams, showing extractWithLlm as dispatcher, the LlmProvider interface, the OpenAI and Anthropic concrete providers, the shared DOCUMENT_KG_PROMPT, and the tolerant parseNTriples helper. - Provider selection precedence table (env > config > default). - Fail-soft contract paragraph. - Module map for packages/cli/src/extraction/. Update CONTEXT.md's cross-reference to point at the root ARCHITECTURE.md instead of the now-deleted .ai/architecture.md. The .ai/ directory was never tracked in git; deleting it leaves no stale references. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Update DEFAULT_MODEL in the OpenAI provider from gpt-4o-mini to gpt-5-nano. gpt-5-nano is cheaper and better-suited for the document knowledge-graph extraction prompt, so it becomes the new sensible default for operators who don't override LlmConfig.model. This is a single coordinated rename: the test surface pins the model literal at eight callsites (defaultModel exposure, snapshot body, four OpenAI fail-soft cases, and the unknown-provider-fallback case), so the test edits and the implementation constant change together in one commit. Followed TDD: updated the pinned literals first, watched seven tests go RED with 'expected gpt-4o-mini to be gpt-5-nano', then flipped the constant and confirmed 16/16 green. Also updated the config.ts JSDoc, the CONTEXT.md OpenAiProvider bullet, the ARCHITECTURE.md class diagram and provider table so the docs match the runtime default. The historical design artifacts under .orchestrator/runs/ are intentionally left untouched to preserve the provenance of the original gpt-4o-mini choice. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…regression guard The default OpenAI model is now `gpt-5-nano`, a reasoning-model that rejects `max_tokens` and `temperature`. The chat-completions request body must therefore differ by model family: - Reasoning models (gpt-5*, o1*, o3*, o4*): `max_completion_tokens`, no `temperature`. - Legacy chat models (e.g. gpt-4o-mini): `max_tokens` + `temperature` as before. This commit updates the existing snapshot to pin the new reasoning-model shape (currently RED — impl still emits the legacy shape regardless of model) and adds a parallel snapshot for gpt-4o-mini as a regression guard for the legacy code path. The impl flip lands in the next commit. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…s legacy)
OpenAI's reasoning-model family rejects the legacy chat-completions
parameters: `max_tokens` must become `max_completion_tokens`, and
`temperature` must be omitted entirely. Without this fix the default
`gpt-5-nano` extraction call fails with:
Unsupported parameter: 'max_tokens' is not supported with this model.
Use 'max_completion_tokens' instead.
Introduce `isReasoningModel(model)`, a tiny `.startsWith` check over
the prefixes `gpt-5`, `o1`, `o3`, `o4` — conservative on purpose;
new families can be appended without restructuring. The body builder
switches on the result:
- Reasoning: `{ model, messages, max_completion_tokens }`
- Legacy: `{ model, messages, max_tokens, temperature: 0.1 }`
URL, headers, message structure, response parsing, timeout, fail-soft
behaviour, and truncation all stay byte-equivalent. The probe script
`extraction-debug-raw.mjs` is updated locally to match the new shape
so the diagnostic stays accurate (script is .gitignored, intentionally
not part of this commit).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…fort The reasoning-model default body needs a much larger token budget than 4096 - hidden chain-of-thought consumes the budget before any visible output is emitted. A raw probe showed gpt-5-nano spending 4592 tokens producing zero visible characters. Update the pinned-shape test to expect max_completion_tokens: 16000 and the new reasoning_effort: 'low' field. Add a new test asserting that an explicit input.maxTokens still wins over the new default. Tighten the legacy-model assertion to also forbid reasoning_effort on the body.
Raw probe against gpt-5-nano showed the model consuming the full 4096
max_completion_tokens on hidden chain-of-thought, producing zero visible
output. Structured RDF extraction does not need deep reasoning; what it
needs is a budget large enough that visible output is still emitted
after the CoT phase.
Two changes in openai.ts, both confined to the reasoning-model branch:
1. Default max_completion_tokens for reasoning models bumps to 16000
(legacy chat models stay at 4096 via max_tokens).
2. Reasoning-model bodies now include reasoning_effort: 'low'.
Factored into buildModelFamilyFields(model, maxTokensOverride) so the
caller's input.maxTokens still wins over either family default. Legacy
model bodies are byte-identical to before this commit.
packages/cli/scripts/extraction-smoke.mjs is an operator tool that
invokes extractWithLlm() against a real provider endpoint with a
small markdown fixture. Useful for verifying that a provider's
request/response shape is actually accepted by the live API — the
mocked vitest suite can't catch parameter rejections like the
max_tokens vs max_completion_tokens divergence between OpenAI's
legacy chat-completions models and the gpt-5 reasoning family.
Usage:
OPENAI_API_KEY=sk-... node packages/cli/scripts/extraction-smoke.mjs openai
ANTHROPIC_API_KEY=sk-... node packages/cli/scripts/extraction-smoke.mjs anthropic
Prerequisite is a fresh pnpm build so the script's dist imports are
in sync with src. Cost is roughly $0.001 per invocation. Stdout
emits a structured JSON summary; stderr carries progress.
Also adds .orchestrator/ to .gitignore — that directory is the
working tree of the agent-orchestrator Python backend, which scaffolds
state files alongside its design-phase artifacts.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…cks, inline helpers Collapse the new extraction modules to their minimal expressive shape: - Drop large file-header JSDocs; the file names + one-line headers carry intent - Inline single-use helpers (buildModelFamilyFields) into the invoke body - Drop redundant intermediate variables and constants that were only used once - Hoist the N-Triples regex out of the parser loop; introduce a local Triple alias Total src trim: 419 lines -> 243 lines (-42%). Behaviour byte-identical; all 18 vitest cases pass and tsc --noEmit stays clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ore .orchestrator/ CONTEXT.md at the repo root looked like a project-wide context file but only documented the extraction subsystem. Moved to packages/cli/src/extraction/CONTEXT.md where it belongs alongside the code it describes. Also revert the .orchestrator/ gitignore entry from the previous slice — the agent-orchestrator working tree is local-only and never pushed; we do not need to advertise it in .gitignore. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
5e2a48d to
9aab188
Compare
added 2 commits
May 12, 2026 16:51
The 60_000-char slice silently dropped data past the cutoff and injected a "[... document truncated for extraction ...]" marker that diluted the prompt. Both providers now send input.markdown as-is; if the document exceeds the model's context window, the API returns a 4xx and the existing fail-soft path emits the standard "[provider] API returned ..." warn.
Both OpenAI and Anthropic providers duplicated ~25 lines each of the same scaffolding: apiKey validation, 60s AbortController timeout, fetch try/catch, non-2xx warn, malformed-JSON warn, NONE/empty handling, parseNTriples invocation, AbortError vs generic-error warn. Factor this into runProvider() inside llm-provider.ts. Each provider now declares only its differences via a ProviderSpec object: name, defaultModel, buildRequest(input, config, model), parseResponse(data). createProvider(spec) wires the spec to the shared runner. Wire-level behaviour is byte-identical (URL, headers, body shapes, fail-soft branches, tokensUsed semantics all preserved). The 18 pinned vitest cases continue to pass. Note: the runner's malformed-response warn message generalised from the provider-specific "choices[0].message.content missing" / "content[0].text missing" to "response text missing". The existing fail-soft tests assert the `[provider]` prefix via regex, not the exact tail of the warn line, so no test changes were required. src LOC: - openai.ts 83 -> 47 lines - anthropic.ts 75 -> 38 lines - llm-provider.ts 27 -> 114 lines (now hosts the runner)
…ose invokeProvider
Drop the parallel ProviderSpec type and createProvider factory. The
LlmProvider interface now directly declares the two per-vendor methods
(buildRequest, parseResponse). Each provider implements the interface
literally:
export const openaiProvider: LlmProvider = {
name: 'openai',
defaultModel: 'gpt-5-nano',
buildRequest(...) { ... },
parseResponse(...) { ... },
};
The shared scaffolding (apiKey check, timeout, fetch, fail-soft, JSON
parse, NONE handling, parseNTriples) now lives in a free function
`invokeProvider(provider, input, config)` exported from llm-provider.ts.
The dispatcher in llm-extractor.ts calls invokeProvider directly
instead of provider.invoke().
One interface, one shape, one call path. No behavioural change — 18/18
vitest cases stay green and tsc --noEmit stays clean.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
When DKG_EXTRACTION_PROVIDER flips the dispatcher away from config.provider, the configured model and baseURL belong to the OTHER provider and would corrupt the actual request — OpenAI baseURL becomes `https://api.openai.com/v1/v1/messages`, OpenAI model name leaks into Anthropic's body.model. This test pins the desired behaviour: when env overrides the provider, each provider must use its own default model + baseURL. Currently RED (URL is openai.com/v1/v1/messages, model is gpt-5-nano). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
When `DKG_EXTRACTION_PROVIDER` overrides `config.provider`, the configured `model` and `baseURL` belong to the OTHER provider and silently corrupt the request — Anthropic was hitting `https://api.openai.com/v1/v1/messages` with body.model='gpt-5-nano' and 100% fail-soft'd to empty. The fix introduces `resolveEffectiveConfig()` which keeps only the `apiKey` when env-routing diverges from `config.provider`, letting each provider fall back to its own defaults (anthropic → claude-sonnet-4-6 + api.anthropic.com, openai → gpt-5-nano + api.openai.com). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The Anthropic Messages API returns `content` as an array of typed blocks. A response can include a leading thinking/tool_use block, multiple text blocks, or both. The current parseResponse reads only `content[0].text` — which silently drops everything when the first block is non-text and loses content past index 0. This test pins the desired behaviour: parseResponse must filter `type === 'text'` blocks, join their texts, and return undefined when there are no text blocks at all. Currently RED (test expects 2 triples, gets 0 because content[0] is the thinking block). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ocks `data.content` is an array of typed blocks. Reading only `content[0].text` silently drops everything when the first block is a thinking/tool_use block (and loses any text past index 0). Filter to `type === 'text'`, concatenate, and return undefined when there are no text blocks at all so invokeProvider's "response text missing" warn fires instead of silently swallowing the response as legitimate empty. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… round-trip
The node-ui settings form submits `{ apiKey, model?, baseURL? }` — no
`provider`. The previous `config.llm = llm` assignment silently stripped
any previously-configured provider, so a node configured for Anthropic
would quietly revert to the default 'openai' on the next save+restart.
Widen the callback's parameter type to accept the optional `provider`
field, and merge the incoming payload onto whatever provider is already
on disk when the caller omits it. The caller can still explicitly set or
change the provider by including it in the payload.
No unit test: `lifecycle.ts` doesn't have isolated unit-test
infrastructure for the inline `llmSettings` callback, and the change is
narrowly localised. Safety nets: `tsc --noEmit` passes, and no existing
test asserts the old stripping behaviour.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Refactors the Layer 2 semantic extractor (`packages/cli/src/extraction/llm-extractor.ts`) from a hard-coded OpenAI `fetch` call into a pluggable `LlmProvider` interface, and ships two concrete providers:
Operators select a provider via `LlmConfig.provider` ('openai' or 'anthropic') or override at runtime with `DKG_EXTRACTION_PROVIDER`. Unknown values fall back to OpenAI with a warning.
The public `extractWithLlm()` signature, `LlmExtractionInput`, and `LlmExtractionOutput` shapes are unchanged. The single caller at `packages/cli/src/daemon/routes/memory.ts:1047` is untouched.
Why
Layer 2 extraction was wired to OpenAI only — operators wanting to use Claude (or future vendors) had no path. The extraction surface is small and self-contained, so a clean refactor unlocks multi-vendor support without rippling.
Reasoning-model body shape
OpenAI's gpt-5* / o1* / o3* / o4* families reject the legacy chat-completions parameters `max_tokens` and arbitrary `temperature`. The provider now detects the family via a small `isReasoningModel()` prefix check and emits:
Caller's `input.maxTokens` still wins over either default.
Fail-soft contract preserved
Every failure mode (missing apiKey / non-2xx response / AbortError / malformed body) returns `{ triples: [], model }` and `console.warn`s prefixed with `[openai]` or `[anthropic]`. Never throws. The daemon caller continues to wrap the call in try/catch as final safety.
Tests
18 vitest cases in `packages/cli/test/extraction-llm.test.ts`, all passing. Coverage:
Live-API smoke test (manual, not in CI)
A new operator script at `packages/cli/scripts/extraction-smoke.mjs` invokes `extractWithLlm()` against the real provider endpoints with a tiny solar-panels fixture. Used to verify request/response shapes are accepted by the live APIs — something mocked vitest cannot do.
Out of scope (follow-ups)
Verification
```
cd packages/cli
npx tsc --noEmit # 0 errors
pnpm exec vitest run test/extraction-llm.test.ts # 18/18 pass in ~17ms
```
🤖 Generated with Claude Code