Summary
When a ContextProvider injects messages from a different session_id than the requesting one (the legitimate cross-session-memory use case in _harness/_memory.py, Mem0 with default scope, and any session-agnostic backend), the framework retains no provenance back to the originating session. Downstream observers — governance middleware, audit pipelines, behavioral baselines — cannot distinguish "fresh content for this session" from "content originally written in a prior session under different tool permissions."
This makes the stateful-agent-backdoor attack class documented in Dai et al., arXiv:2605.06158 (preprint May 2026) invisible to governance layers that hook into the context provider pipeline. The paper specifically names Microsoft Agent Framework in its §6.1 / Table 10 realism survey as satisfying the attack's two preconditions (session-level isolation via AgentSession; persistent storage shared across sessions by default).
Motivation
The Dai et al. attack chains sub-backdoors across sessions via persistent memory: session 1 (no sensitive tools) writes attack state; session 2 (file-system tools available) collects secrets and updates state; session 3 (network tools available) exfiltrates. Each session's individual actions are policy-permitted; the chain crosses session boundaries via persisted state. The paper reports 80–95% attack success rate across four base models under supply-chain SFT delivery.
The paper's primary suggested defense is cross-session behavioral pattern analysis (§6.3). This isn't currently possible against MAF without forking or wrapping every concrete provider, because the framework doesn't surface the cross-session signal at the context layer where governance middleware operates.
Existing architecture — already mostly there
MAF's session model is already well-designed for this:
SessionContext.session_id is first-class
HistoryProvider.get_messages(session_id, ...) / save_messages(session_id, ...) are session-aware abstract methods (_sessions.py:458-493)
Message.additional_properties["_attribution"] is already used to attribute injected context to its source provider (_sessions.py:233-248, :556-563)
- The long-term memory system at
_harness/_memory.py already tracks MemoryTopicRecord.session_ids: list[str] per consolidated topic (_memory.py:353, :382) — the cross-session provenance data is already captured, just not propagated to the attribution layer.
The gap is just that the existing _attribution payload ({"source_id", "source_type"}) doesn't include the originating session_id, so context observers can't see when a message they received came from a different session than the one they're running in.
Proposed approach (additive, backward-compatible)
Extend the existing _attribution payload with an optional origin_session_id: str | None key. Update built-in providers and the harness memory system to populate it when returning content from a different session than the requesting one. Ship a sample ContextProvider observer in samples/ demonstrating how a governance layer can detect cross-session injection.
Sketch:
# _sessions.py: SessionContext.extend_messages
attribution = {"source_id": source_id, "source_type": type(source).__name__}
if origin_session_id is not None:
attribution["origin_session_id"] = origin_session_id
# Downstream observer (sample)
class CrossSessionObserver(ContextProvider):
async def before_run(self, *, agent, session, context, state):
current = context.session_id
for source_id, messages in context.context_messages.items():
for msg in messages:
attrib = msg.additional_properties.get("_attribution", {})
origin = attrib.get("origin_session_id")
if origin and origin != current:
self._on_cross_session_access(source_id, origin, current, msg)
No changes to the abstract ContextProvider / HistoryProvider interfaces. No new public types. The default behavior is unchanged — if a provider doesn't populate origin_session_id, observers see nothing (attribution-absent case is indistinguishable from same-session, preserving today's semantics).
Scope proposed for the accompanying draft PR
- Extend
SessionContext.extend_messages to accept an optional origin_session_id parameter, threaded into the attribution dict
- Update the harness memory system (
_harness/_memory.py) to populate the field when injecting consolidated memories or transcripts from prior sessions
- Sample observer in
samples/ (Python), README citing Dai et al.
- Tests in
test_sessions.py (attribution roundtrip) and test_harness_memory.py (origin populated correctly)
Estimated ~25-50 LOC core + sample + tests. Backward-compatible (additive). No new public APIs beyond the new attribution key.
The accompanying PR will be marked draft explicitly to invite API-shape discussion before merge — per CONTRIBUTING's guidance on not surprising maintainers with new APIs. Happy to revise the shape based on any direction below.
Open API questions for maintainer input
- Attribution key naming.
origin_session_id is one option; source_session_id mirrors source_id more closely. Preference?
- Threading mechanism. Should
extend_messages take a parameter, or should providers set the field directly on msg.additional_properties["_attribution"] after they call extend_messages? Parameter is more discoverable; direct-set is simpler and matches how _split_service_call_messages reads attribution.
- Built-in providers. Should
InMemoryHistoryProvider / FileHistoryProvider populate the field? They're session-scoped by construction so it should always equal session_id when set, but populating consistently helps observers reason about absence (no field = no info; field present and equal = same-session; field present and different = cross-session).
- Typed shape. Whether to extend the attribution dict in-place (current proposal) or introduce a typed
MessageAttribution dataclass. Latter is cleaner long-term but a bigger surface change.
Related
Disclosure note
The Dai et al. paper is publicly available on arXiv (preprint May 2026) and the authors' Ethics Statement (Appendix J) explicitly notes the work was published openly under a "supply-chain barrier" rationale rather than via coordinated disclosure (contrast: the concurrent LoopTrap paper, arXiv:2605.05846, explicitly disclosed to surveyed framework maintainers pre-submission). This issue is the first MAF-side reference to Dai et al. I could find via gh search; if there's existing coordination I missed, happy to redirect.
Surfaced during independent audit conducted by @finnoybu (Ken Tannenbaum, AEGIS Initiative); [MEDIUM, python/packages/core].
Summary
When a
ContextProviderinjects messages from a differentsession_idthan the requesting one (the legitimate cross-session-memory use case in_harness/_memory.py, Mem0 with default scope, and any session-agnostic backend), the framework retains no provenance back to the originating session. Downstream observers — governance middleware, audit pipelines, behavioral baselines — cannot distinguish "fresh content for this session" from "content originally written in a prior session under different tool permissions."This makes the stateful-agent-backdoor attack class documented in Dai et al., arXiv:2605.06158 (preprint May 2026) invisible to governance layers that hook into the context provider pipeline. The paper specifically names Microsoft Agent Framework in its §6.1 / Table 10 realism survey as satisfying the attack's two preconditions (session-level isolation via
AgentSession; persistent storage shared across sessions by default).Motivation
The Dai et al. attack chains sub-backdoors across sessions via persistent memory: session 1 (no sensitive tools) writes attack state; session 2 (file-system tools available) collects secrets and updates state; session 3 (network tools available) exfiltrates. Each session's individual actions are policy-permitted; the chain crosses session boundaries via persisted state. The paper reports 80–95% attack success rate across four base models under supply-chain SFT delivery.
The paper's primary suggested defense is cross-session behavioral pattern analysis (§6.3). This isn't currently possible against MAF without forking or wrapping every concrete provider, because the framework doesn't surface the cross-session signal at the context layer where governance middleware operates.
Existing architecture — already mostly there
MAF's session model is already well-designed for this:
SessionContext.session_idis first-classHistoryProvider.get_messages(session_id, ...)/save_messages(session_id, ...)are session-aware abstract methods (_sessions.py:458-493)Message.additional_properties["_attribution"]is already used to attribute injected context to its source provider (_sessions.py:233-248,:556-563)_harness/_memory.pyalready tracksMemoryTopicRecord.session_ids: list[str]per consolidated topic (_memory.py:353,:382) — the cross-session provenance data is already captured, just not propagated to the attribution layer.The gap is just that the existing
_attributionpayload ({"source_id", "source_type"}) doesn't include the originating session_id, so context observers can't see when a message they received came from a different session than the one they're running in.Proposed approach (additive, backward-compatible)
Extend the existing
_attributionpayload with an optionalorigin_session_id: str | Nonekey. Update built-in providers and the harness memory system to populate it when returning content from a different session than the requesting one. Ship a sampleContextProviderobserver insamples/demonstrating how a governance layer can detect cross-session injection.Sketch:
No changes to the abstract
ContextProvider/HistoryProviderinterfaces. No new public types. The default behavior is unchanged — if a provider doesn't populateorigin_session_id, observers see nothing (attribution-absent case is indistinguishable from same-session, preserving today's semantics).Scope proposed for the accompanying draft PR
SessionContext.extend_messagesto accept an optionalorigin_session_idparameter, threaded into the attribution dict_harness/_memory.py) to populate the field when injecting consolidated memories or transcripts from prior sessionssamples/(Python), README citing Dai et al.test_sessions.py(attribution roundtrip) andtest_harness_memory.py(origin populated correctly)Estimated ~25-50 LOC core + sample + tests. Backward-compatible (additive). No new public APIs beyond the new attribution key.
The accompanying PR will be marked draft explicitly to invite API-shape discussion before merge — per CONTRIBUTING's guidance on not surprising maintainers with new APIs. Happy to revise the shape based on any direction below.
Open API questions for maintainer input
origin_session_idis one option;source_session_idmirrorssource_idmore closely. Preference?extend_messagestake a parameter, or should providers set the field directly onmsg.additional_properties["_attribution"]after they callextend_messages? Parameter is more discoverable; direct-set is simpler and matches how_split_service_call_messagesreads attribution.InMemoryHistoryProvider/FileHistoryProviderpopulate the field? They're session-scoped by construction so it should always equalsession_idwhen set, but populating consistently helps observers reason about absence (no field = no info; field present and equal = same-session; field present and different = cross-session).MessageAttributiondataclass. Latter is cleaner long-term but a bigger surface change.Related
Disclosure note
The Dai et al. paper is publicly available on arXiv (preprint May 2026) and the authors' Ethics Statement (Appendix J) explicitly notes the work was published openly under a "supply-chain barrier" rationale rather than via coordinated disclosure (contrast: the concurrent LoopTrap paper, arXiv:2605.05846, explicitly disclosed to surveyed framework maintainers pre-submission). This issue is the first MAF-side reference to Dai et al. I could find via
gh search; if there's existing coordination I missed, happy to redirect.Surfaced during independent audit conducted by @finnoybu (Ken Tannenbaum, AEGIS Initiative); [MEDIUM, python/packages/core].