Skip to content

Prevent duplicate system instructions in Python telemetry#5981

Draft
Copilot wants to merge 14 commits into
mainfrom
copilot/fix-duplicated-system-instructions
Draft

Prevent duplicate system instructions in Python telemetry#5981
Copilot wants to merge 14 commits into
mainfrom
copilot/fix-duplicated-system-instructions

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented May 20, 2026

Motivation and Context

Python observability currently records system instructions twice on the same span: once in gen_ai.system_instructions and again in gen_ai.input.messages. This change keeps system instructions in their dedicated attribute and limits input messages to the actual conversation history.

Description

  • Telemetry serialization

    • Update Python observability message capture to stop writing prepended system instructions into gen_ai.input.messages.
    • Preserve existing gen_ai.system_instructions emission.
  • Behavioral scope

    • Keep the logging path intact for prepended instruction messages.
    • Only narrow the span attribute payload so gen_ai.input.messages reflects user/assistant/tool history.
  • Coverage

    • Adjust focused observability tests for chat client and agent paths.
    • Add assertions that non-ASCII system instructions remain preserved without leaking into input messages.
# Before
gen_ai.system_instructions = [{"type": "text", "content": "Be helpful."}]
gen_ai.input.messages = [
    {"role": "system", "parts": [{"type": "text", "content": "Be helpful."}]},
    {"role": "user", "parts": [{"type": "text", "content": "hello"}]},
]

# After
gen_ai.system_instructions = [{"type": "text", "content": "Be helpful."}]
gen_ai.input.messages = [
    {"role": "user", "parts": [{"type": "text", "content": "hello"}]},
]

Contribution Checklist

  • The code builds clean without any errors or warnings
  • The PR follows the Contribution Guidelines
  • All unit tests pass, and I have added new tests where possible
  • Is this a breaking change? If yes, add "[BREAKING]" prefix to the title of the PR.

Copilot AI requested review from Copilot and removed request for Copilot May 20, 2026 21:17
Copilot AI requested review from Copilot and removed request for Copilot May 20, 2026 21:18
Copilot AI changed the title [WIP] Fix duplicated system instructions in telemetry Prevent duplicate system instructions in Python telemetry May 20, 2026
Copilot AI requested a review from alliscode May 20, 2026 21:19
@alliscode
Copy link
Copy Markdown
Member

@copilot The relevant rule in the OTEL semantic conventions is:

gen_ai.system_instructions should be used when the provider/API allows system instructions to be provided separately from chat history. Instructions that are part of chat history should instead be recorded in gen_ai.input.messages.

Please make sure there are test case to ensure this is the case. Also test the case where the users input messages already contain a system message and system_insrtructions is also supplied, the span should contain:

gen_ai.system_instructions = framework/options instructions
gen_ai.input.messages      = original system message + user/assistant/tool history

Copilot AI requested review from Copilot and removed request for Copilot May 20, 2026 21:33
Copy link
Copy Markdown
Contributor Author

Copilot AI commented May 20, 2026

@copilot The relevant rule in the OTEL semantic conventions is:

...

Addressed in 5b12184. I added explicit coverage for separate framework instructions vs chat-history system messages, including the mixed case where system_instructions is set while the original input already contains a system message, for both chat-client and agent spans.

prepped = prepend_instructions_to_messages(normalize_messages(messages), system_instructions)
otel_messages: list[dict[str, Any]] = []
normalized_messages = normalize_messages(messages)
prepped = prepend_instructions_to_messages(normalized_messages, system_instructions)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot Is prepped used for anything else? Why do we still need to call prepend_instructions_to_messages if we intentionally skip over what it's adding?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated in 016cc1a. prepend_instructions_to_messages is still needed for the logging path: we still emit prepended framework instructions as message log events, but _capture_messages now builds gen_ai.input.messages directly from the original normalized chat history. I also added a focused regression test for that split.

Copilot AI requested review from Copilot and removed request for Copilot May 20, 2026 22:03
Copilot AI review requested due to automatic review settings May 20, 2026 22:04
Copilot AI requested review from Copilot and removed request for Copilot May 20, 2026 22:07
Copilot AI requested review from Copilot and removed request for Copilot May 20, 2026 22:08
Copilot AI requested a review from alliscode May 20, 2026 22:09
Comment on lines +2163 to +2175
for index, (message, otel_message) in enumerate(
zip(logging_messages[:prepended_count], prepended_messages, strict=True)
):
# Reuse the otel message representation for logging instead of calling to_dict()
# to avoid expensive Pydantic serialization overhead
otel_message = _to_otel_message(message)
otel_messages.append(otel_message)
logger.info(
otel_message,
extra={
OtelAttr.EVENT_NAME: OtelAttr.CHOICE if output else ROLE_EVENT_MAP.get(message.role),
OtelAttr.PROVIDER_NAME: provider_name,
MessageListTimestampFilter.INDEX_KEY: index,
},
)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot Why are we doing this twice now?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 84e5a62. _capture_messages now walks the logging message list once, logs each OTEL message once, and only appends the non-prepended entries to the span payload. I also added a regression test for the mixed framework-instructions + original system-message case so we keep the expected split without double-processing.

Copilot AI requested review from alliscode and Copilot and removed request for Copilot May 20, 2026 22:21
otel_messages: list[dict[str, Any]] = []
for index, message in enumerate(prepped):
normalized_messages = normalize_messages(messages)
logging_messages = prepend_instructions_to_messages(normalized_messages, system_instructions)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should not be always logging the messages with prepended instructions. I would expect this to be split similar to the span messages

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot please look

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 5e039d1. _capture_messages now logs only the normalized chat history, so framework/options instructions stay in gen_ai.system_instructions instead of being prepended into the message logs. I also updated the focused regression tests for both the instructions-only and mixed system-message cases.

Copilot AI requested review from Copilot and removed request for Copilot May 20, 2026 22:34
Copy link
Copy Markdown
Contributor Author

Copilot AI commented May 20, 2026

@copilot The relevant rule in the OTEL semantic conventions is:

...

Addressed in 5e039d1. The observability tests cover both cases now: separate framework/options instructions stay in gen_ai.system_instructions, and when the original input already contains a system message it remains in gen_ai.input.messages alongside the rest of the chat history for both chat-client and agent spans.

Copilot AI requested a review from alliscode May 20, 2026 22:35
otel_messages: list[dict[str, Any]] = []
for index, message in enumerate(prepped):
normalized_messages = normalize_messages(messages)
span_messages = []
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
span_messages = []
otel_messages: list[dict[str, Any]] = []

@copilot

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in 589fad0.

Copilot AI requested review from Copilot and removed request for Copilot May 20, 2026 22:54
Copilot AI requested a review from alliscode May 20, 2026 22:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Python: System instructions duplicated in telemetry: appears in both gen_ai.system_instructions and gen_ai.input_messages

2 participants