Prevent duplicate system instructions in Python telemetry#5981
Prevent duplicate system instructions in Python telemetry#5981Copilot wants to merge 14 commits into
Conversation
|
@copilot The relevant rule in the OTEL semantic conventions is: Please make sure there are test case to ensure this is the case. Also test the case where the users input messages already contain a system message and system_insrtructions is also supplied, the span should contain: |
... Addressed in 5b12184. I added explicit coverage for separate framework instructions vs chat-history system messages, including the mixed case where |
| prepped = prepend_instructions_to_messages(normalize_messages(messages), system_instructions) | ||
| otel_messages: list[dict[str, Any]] = [] | ||
| normalized_messages = normalize_messages(messages) | ||
| prepped = prepend_instructions_to_messages(normalized_messages, system_instructions) |
There was a problem hiding this comment.
@copilot Is prepped used for anything else? Why do we still need to call prepend_instructions_to_messages if we intentionally skip over what it's adding?
There was a problem hiding this comment.
Updated in 016cc1a. prepend_instructions_to_messages is still needed for the logging path: we still emit prepended framework instructions as message log events, but _capture_messages now builds gen_ai.input.messages directly from the original normalized chat history. I also added a focused regression test for that split.
| for index, (message, otel_message) in enumerate( | ||
| zip(logging_messages[:prepended_count], prepended_messages, strict=True) | ||
| ): | ||
| # Reuse the otel message representation for logging instead of calling to_dict() | ||
| # to avoid expensive Pydantic serialization overhead | ||
| otel_message = _to_otel_message(message) | ||
| otel_messages.append(otel_message) | ||
| logger.info( | ||
| otel_message, | ||
| extra={ | ||
| OtelAttr.EVENT_NAME: OtelAttr.CHOICE if output else ROLE_EVENT_MAP.get(message.role), | ||
| OtelAttr.PROVIDER_NAME: provider_name, | ||
| MessageListTimestampFilter.INDEX_KEY: index, | ||
| }, | ||
| ) |
There was a problem hiding this comment.
Fixed in 84e5a62. _capture_messages now walks the logging message list once, logs each OTEL message once, and only appends the non-prepended entries to the span payload. I also added a regression test for the mixed framework-instructions + original system-message case so we keep the expected split without double-processing.
| otel_messages: list[dict[str, Any]] = [] | ||
| for index, message in enumerate(prepped): | ||
| normalized_messages = normalize_messages(messages) | ||
| logging_messages = prepend_instructions_to_messages(normalized_messages, system_instructions) |
There was a problem hiding this comment.
We should not be always logging the messages with prepended instructions. I would expect this to be split similar to the span messages
There was a problem hiding this comment.
Fixed in 5e039d1. _capture_messages now logs only the normalized chat history, so framework/options instructions stay in gen_ai.system_instructions instead of being prepended into the message logs. I also updated the focused regression tests for both the instructions-only and mixed system-message cases.
... Addressed in 5e039d1. The observability tests cover both cases now: separate framework/options instructions stay in |
| otel_messages: list[dict[str, Any]] = [] | ||
| for index, message in enumerate(prepped): | ||
| normalized_messages = normalize_messages(messages) | ||
| span_messages = [] |
There was a problem hiding this comment.
| span_messages = [] | |
| otel_messages: list[dict[str, Any]] = [] |
Motivation and Context
Python observability currently records system instructions twice on the same span: once in
gen_ai.system_instructionsand again ingen_ai.input.messages. This change keeps system instructions in their dedicated attribute and limits input messages to the actual conversation history.Description
Telemetry serialization
gen_ai.input.messages.gen_ai.system_instructionsemission.Behavioral scope
gen_ai.input.messagesreflects user/assistant/tool history.Coverage
Contribution Checklist