Skip to content

feat(llmobs): prompt management SDK methods#18186

Draft
PROFeNoM wants to merge 10 commits into
mainfrom
alex/MLOB-7524_prompt-crud-api
Draft

feat(llmobs): prompt management SDK methods#18186
PROFeNoM wants to merge 10 commits into
mainfrom
alex/MLOB-7524_prompt-crud-api

Conversation

@PROFeNoM
Copy link
Copy Markdown
Contributor

@PROFeNoM PROFeNoM commented May 20, 2026

Description

Adds prompt management methods to the LLMObs Python SDK, calling the new public API endpoints from MLOB-7523.

Blocked by: https://github.com/DataDog/dd-source/pull/443753 (public CRUD API routes)

New public API

# Write methods (require DD_API_KEY + DD_APP_KEY)
LLMObs.create_prompt(prompt_id, template, *, title, description, user_version, labels)
LLMObs.create_prompt_version(prompt_id, template, *, description, user_version, labels)
LLMObs.update_prompt(prompt_id, *, title, description)
LLMObs.update_prompt_version(prompt_id, version, *, labels, description)
LLMObs.delete_prompt(prompt_id)

# Read methods (require DD_API_KEY only)
LLMObs.list_prompts(*, ml_app)
LLMObs.list_prompt_versions(prompt_id)

Not covered: GET /prompts/{prompt_id}/versions/{version} is intentionally left out. Prefer to add it once #18127 (hybrid prompt delivery) lands, since it changes the get_prompt signature - at that point we can decide whether to add version= as an optional parameter to get_prompt rather than a separate method.

New types (ddtrace/llmobs/types.py)

  • PromptLabel = Literal["development", "production"] - allowed label values
  • ChatMessage(TypedDict) - template message format (role + content only)
  • PromptResponse(TypedDict) - returned by create/update/list prompt operations
  • PromptVersionResponse(TypedDict) - returned by create/update/list version operations
  • DeletedPromptResponse(TypedDict) - returned by delete

Exception hierarchy

PromptAPIError (base)
  PromptAuthError        - 401/403 (bad API/app key)
  PromptValidationError  - 400 (bad input)
  PromptNotFoundError    - 404
  PromptConflictError    - 409 (duplicate prompt_id)
  PromptServerError      - 5xx

Each method documents which exceptions it can raise.

Cache changes (cache.py)

  • WarmCache now uses per-prompt subdirectories: ~/.cache/datadog/llmobs/prompts/{prompt_id}/{label}.json
  • Added evict_prompt(prompt_id) - uses shutil.rmtree on the prompt directory
  • Previous flat-file layout had prefix collision bugs (e.g., deleting "greeting" could evict "greeting-v2" cache entries)

Files changed

File Change
types.py PromptLabel, ChatMessage, 3 response TypedDicts, 5 exception classes
cache.py Per-prompt subdirectory layout, evict_prompt method
manager.py _request helper, 7 CRUD methods, API key validation on get_prompt
_llmobs.py 7 public classmethods delegating to manager
test_prompts.py Exception mapping test, cache eviction test
E2E validation (staging) - 13/13 pass

Tested against datad0g.com using the SDK. Save the script below, install the branch wheel, replace keys with your own staging credentials, and run with python test_sdk.py.

# Journey SDK method Expected
1 Create prompt create_prompt() Returns PromptResponse with matching prompt_id
2 Create duplicate create_prompt() Raises PromptConflictError (409)
3 List prompts list_prompts() Our prompt in returned list
4 Create version create_prompt_version() Returns PromptVersionResponse
5 List versions list_prompt_versions() >= 2 versions
6 Update prompt update_prompt() Returns updated PromptResponse
7 Update version update_prompt_version() Returns updated PromptVersionResponse
8 Get prompt (read path) get_prompt() source="registry"
9 Get prompt with label get_prompt(label="development") Matches labeled version
10 Validation error update_prompt() with no fields Raises PromptValidationError (client-side)
11 Delete prompt delete_prompt() Returns DeletedPromptResponse
12 Get deleted prompt get_prompt() Raises ValueError (no fallback)
13 Delete non-existent delete_prompt() Raises PromptNotFoundError (404)
"""
Setup:
    uv venv && source .venv/bin/activate
    # Install the branch wheel (update build number as needed):
    uv pip install --reinstall \
        --find-links https://dd-trace-py-builds.s3.amazonaws.com/114547844/index.html \
        ddtrace==4.10.0rc1

Run:
    DD_API_KEY="<your-datad0g-api-key>" \
    DD_APP_KEY="<your-datad0g-app-key>" \
    DD_SITE=datad0g.com \
    python test_sdk.py
"""

import os
import sys
import time
import traceback

os.environ.setdefault("DD_API_KEY", "<your-datad0g-api-key>")
os.environ.setdefault("DD_APP_KEY", "<your-datad0g-app-key>")
os.environ.setdefault("DD_SITE", "datad0g.com")

from ddtrace.llmobs import LLMObs
from ddtrace.llmobs.types import (
    PromptAPIError,
    PromptAuthError,
    PromptConflictError,
    PromptNotFoundError,
    PromptServerError,
    PromptValidationError,
)

RUN_ID = f"e2e-{int(time.time())}"
PROMPT_ID = f"sdk-test-{RUN_ID}"

passed = 0
failed = 0


def test(name):
    def decorator(fn):
        global passed, failed
        try:
            fn()
            print(f"[PASS] {name}")
            passed += 1
        except Exception:
            print(f"[FAIL] {name}")
            traceback.print_exc()
            failed += 1
        return fn
    return decorator


print(f"=== Prompt CRUD E2E - SDK ===")
print(f"Run ID: {RUN_ID}")
print(f"Prompt ID: {PROMPT_ID}")
print()


@test("Create prompt")
def _():
    resp = LLMObs.create_prompt(
        PROMPT_ID,
        [
            {"role": "system", "content": "You are a {{persona}}."},
            {"role": "user", "content": "{{question}}"},
        ],
        title="E2E Test Prompt",
        description="Created by SDK e2e test",
    )
    print(f"  Response: {resp}")
    assert resp["prompt_id"] == PROMPT_ID


@test("Create duplicate raises PromptConflictError")
def _():
    try:
        LLMObs.create_prompt(PROMPT_ID, [{"role": "user", "content": "dup"}])
        assert False, "Should have raised"
    except PromptConflictError as e:
        assert e.status == 409


@test("List prompts")
def _():
    prompts = LLMObs.list_prompts()
    print(f"  Total: {len(prompts)}")
    assert PROMPT_ID in [p["prompt_id"] for p in prompts]


@test("Create prompt version")
def _():
    resp = LLMObs.create_prompt_version(
        PROMPT_ID,
        [
            {"role": "system", "content": "You are a helpful {{persona}}."},
            {"role": "user", "content": "Please answer: {{question}}"},
        ],
        description="v2 - improved",
        user_version="v2",
    )
    print(f"  Response: {resp}")


@test("List prompt versions")
def _():
    versions = LLMObs.list_prompt_versions(PROMPT_ID)
    print(f"  Count: {len(versions)}")
    assert len(versions) >= 2


@test("Update prompt metadata")
def _():
    resp = LLMObs.update_prompt(PROMPT_ID, title="Updated", description="Updated")
    print(f"  Response: {resp}")


@test("Update prompt version")
def _():
    versions = LLMObs.list_prompt_versions(PROMPT_ID)
    ver = str(versions[-1].get("version", 1))
    resp = LLMObs.update_prompt_version(
        PROMPT_ID, ver, description="Updated version", labels=["development"]
    )
    print(f"  Response: {resp}")


@test("Get prompt (read path)")
def _():
    prompt = LLMObs.get_prompt(PROMPT_ID)
    print(f"  id={prompt.id}, version={prompt.version}, source={prompt.source}")
    assert prompt.id == PROMPT_ID
    assert prompt.source == "registry"


@test("Get prompt with label=development")
def _():
    try:
        prompt = LLMObs.get_prompt(PROMPT_ID, label="development")
        print(f"  id={prompt.id}, version={prompt.version}, label={prompt.label}")
    except ValueError as e:
        print(f"  Label not found (expected if not set): {e}")


@test("Update with no fields raises PromptValidationError")
def _():
    try:
        LLMObs.update_prompt(PROMPT_ID)
        assert False
    except PromptValidationError as e:
        assert e.status == 0


@test("Delete prompt")
def _():
    resp = LLMObs.delete_prompt(PROMPT_ID)
    print(f"  Response: {resp}")


@test("Get deleted prompt raises ValueError")
def _():
    LLMObs.clear_prompt_cache(hot=True, warm=True)
    try:
        LLMObs.get_prompt(PROMPT_ID)
        assert False
    except ValueError as e:
        assert "could not be fetched" in str(e)


@test("Delete non-existent raises PromptNotFoundError")
def _():
    try:
        LLMObs.delete_prompt("nonexistent-" + RUN_ID)
        assert False
    except PromptNotFoundError as e:
        assert e.status == 404


print()
print(f"=== Results: {passed} passed, {failed} failed ===")
sys.exit(1 if failed > 0 else 0)

Add CRUD operations for LLM Observability prompt registry to the Python SDK.
@datadog-prod-us1-6
Copy link
Copy Markdown

datadog-prod-us1-6 Bot commented May 20, 2026

Tests

🎉 All green!

🧪 All tests passed
❄️ No new flaky tests detected

This comment will be updated automatically if new data arrives.
🔗 Commit SHA: 17efe7f | Docs | Datadog PR Page | Give us feedback!

@PROFeNoM PROFeNoM changed the title feat(llmobs): prompt management write methods [MLOB-7524] feat(llmobs): prompt management write methods May 20, 2026
@cit-pr-commenter-54b7da
Copy link
Copy Markdown

cit-pr-commenter-54b7da Bot commented May 20, 2026

Codeowners resolved as

ddtrace/llmobs/_prompts/manager.py                                      @DataDog/ml-observability

PROFeNoM added 3 commits May 20, 2026 15:15
list_prompts and list_prompt_versions only need DD_API_KEY since the
backend uses ValidReportingAPIUser for read endpoints.
@PROFeNoM PROFeNoM changed the title feat(llmobs): prompt management write methods feat(llmobs): prompt management SDK methods [MLOB-7524] May 20, 2026
@PROFeNoM PROFeNoM changed the title feat(llmobs): prompt management SDK methods [MLOB-7524] feat(llmobs): prompt management SDK methods May 21, 2026
PROFeNoM added 3 commits May 21, 2026 09:22
_request() passes body= to conn.request(), mock signature needed to match.
…[MLOB-7524]

Handler expects filter[ml_app], SDK was sending ml_app.
…B-7524]

Plain JSON API returns bare arrays, not JSONAPI {"data": [...]} wrappers.
@PROFeNoM PROFeNoM force-pushed the alex/MLOB-7524_prompt-crud-api branch from cfd1734 to 17efe7f Compare May 21, 2026 16:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant