Purview: Parallelize PSPC cold-cache scope refresh#5832
Conversation
4362b40 to
e6b517e
Compare
There was a problem hiding this comment.
Pull request overview
Adds opt-in parallel Purview protection-scope cache refresh for Python and .NET so cold-cache ProcessContent calls can proceed while scope data is warmed asynchronously.
Changes:
- Adds parallel protection scope retrieval settings and documentation.
- Adds background scope refresh paths and ProcessContent scope identifier invalidation.
- Extends tests for settings, models, cache invalidation, and parallel retrieval behavior.
Reviewed changes
Copilot reviewed 15 out of 15 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
python/packages/purview/tests/purview/test_settings.py |
Covers new Python setting behavior. |
python/packages/purview/tests/purview/test_purview_models.py |
Covers ProcessContent scope identifier deserialization. |
python/packages/purview/tests/purview/test_processor.py |
Adds Python processor tests for invalidation and parallel refresh. |
python/packages/purview/README.md |
Documents Python cache invalidation and parallel retrieval option. |
python/packages/purview/agent_framework_purview/_settings.py |
Adds Python settings key. |
python/packages/purview/agent_framework_purview/_processor.py |
Implements Python parallel refresh and scope-id invalidation. |
python/packages/purview/agent_framework_purview/_models.py |
Adds ProcessContent response scope identifier support. |
dotnet/tests/Microsoft.Agents.AI.Purview.UnitTests/ScopedContentProcessorTests.cs |
Adds .NET processor tests for invalidation and parallel retrieval. |
dotnet/src/Microsoft.Agents.AI.Purview/ScopedContentProcessor.cs |
Implements .NET parallel refresh queuing and foreground invalidation. |
dotnet/src/Microsoft.Agents.AI.Purview/README.md |
Documents .NET setting. |
dotnet/src/Microsoft.Agents.AI.Purview/PurviewSettings.cs |
Adds .NET setting. |
dotnet/src/Microsoft.Agents.AI.Purview/PurviewClient.cs |
Captures ProcessContent ETag as scope identifier. |
dotnet/src/Microsoft.Agents.AI.Purview/Models/Responses/ProcessContentResponse.cs |
Adds non-serialized scope identifier property. |
dotnet/src/Microsoft.Agents.AI.Purview/Models/Jobs/ScopeRetrievalJob.cs |
Adds background scope refresh job model. |
dotnet/src/Microsoft.Agents.AI.Purview/BackgroundJobRunner.cs |
Executes background scope retrieval jobs and caches responses. |
Comments suppressed due to low confidence (1)
python/packages/purview/agent_framework_purview/_processor.py:314
- Calling
_combine_policy_actionswith an emptydlp_actionslist is not a no-op: the helper only keeps existing actions that havea.action, so a ProcessContent response containing a restriction-only policy action (for examplerestrictionAction == block) is dropped. The parallel cold path always reaches this line with an empty local action list, which can causeprocess_messagesto miss a service-enforced block.
pc_resp.policy_actions = self._combine_policy_actions(pc_resp.policy_actions, dlp_actions)
e6b517e to
1ebf7f7
Compare
1ebf7f7 to
ba50cd3
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 6 out of 6 changed files in this pull request and generated 5 comments.
Comments suppressed due to low confidence (1)
python/packages/purview/tests/purview/test_processor.py:347
- This test never exercises the cached-scope path:
_process_with_scopessees a cache miss, startsget_protection_scopesonly in the background, and calls ProcessContent without thescope_identifierfrompsResponse. As written it only verifies cold-miss removal for a zero response, so it does not cover the documented case where cached scopes are stale; seed the cache withpsResponseor assert the background behavior separately.
request = process_content_request_factory()
await processor._process_with_scopes(request)
await asyncio.gather(*list(processor._background_tasks))
cached = await cache.get(f"purview:payment_required:{request.tenant_id}")
assert isinstance(cached, PurviewPaymentRequiredError)
async def test_map_messages_with_user_id_in_additional_properties(self, mock_client: AsyncMock) -> None:
"""Test user_id extraction from message additional_properties."""
settings = PurviewSettings(
app_name="Test App",
tenant_id="12345678-1234-1234-1234-123456789012",
42a6d37 to
8c98557
Compare
86865d6 to
dd3e286
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 10 out of 10 changed files in this pull request and generated 2 comments.
Comments suppressed due to low confidence (3)
python/samples/05-end-to-end/purview_agent/sample_purview_agent.py:189
- This also feeds the
AZURE_OPENAI_ENDPOINTvalue intoproject_endpoint. Sinceproject_endpointmust be a Foundry project endpoint, the chat middleware path will fail or target the wrong service when users follow the sample's environment variable names.
project_endpoint=endpoint,
python/samples/05-end-to-end/purview_agent/sample_purview_agent.py:236
- This custom-cache path has the same endpoint mismatch: the sample reads
AZURE_OPENAI_ENDPOINTbut now passes it as a Foundryproject_endpoint. Use the Foundry project endpoint environment variable here as well so the sample remains runnable.
client = FoundryChatClient(model=deployment, project_endpoint=endpoint, credential=AzureCliCredential())
python/samples/05-end-to-end/purview_agent/sample_purview_agent.py:278
- This default-cache client is initialized with the value from
AZURE_OPENAI_ENDPOINTeven thoughproject_endpointexpects a Foundry project endpoint. Keeping the old environment variable here makes this sample path fail for users with an Azure OpenAI endpoint configured.
client = FoundryChatClient(model=deployment, project_endpoint=endpoint, credential=AzureCliCredential())
|
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 14 out of 14 changed files in this pull request and generated 3 comments.
Comments suppressed due to low confidence (1)
dotnet/src/Microsoft.Agents.AI.Purview/BackgroundJobRunner.cs:87
- This payment-required cache write can also abort the background consumer if the cache provider throws a
SystemException-derived failure, because the outer runner filter will not catch it. Since 402 caching is intended to be best-effort, this should be caught and logged locally so a cache outage does not stop all subsequent background jobs.
await this._cacheProvider.SetAsync(
new PaymentRequiredCacheKey(scopeRetrievalJob.Request.TenantId),
new PaymentRequiredCacheEntry(ex.Message),
CancellationToken.None).ConfigureAwait(false);
|
Could we also check the cache miss contentActivities call path? |
|
Check the README(4, all purview, one sample, and one with the actual package(python and .net)) as well. |
Deduplicate combined policy actions by action and restriction action so restriction-only actions are preserved without duplicating identical entries. Cache tenant-level payment-required state from background scope refresh so subsequent calls short-circuit consistently.
…l and add unit tests for cache write failures
…tyJob when no applicable scopes are found and update related tests
…ct caching optimizations and policy enforcement scenarios Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Motivation and Context
Improve Purview PSPC cold-cache behavior by allowing
ProcessContentto run without waiting for a foregroundProtectionScopeslookup. This reduces user-visible latency on cache misses while still refreshing theProtectionScopescache for future requests.The implementation aligns Agent Framework's Python and .NET Purview packages with the Purview SDK's parallel scope retrieval behavior.
Description
ProtectionScopescache miss, refresh scopes in the background whileProcessContentruns in the foreground.ContentActivities.Prefer: evaluateInline; cached inline scopes set inline evaluation explicitly..NETScopeRetrievalJobhandling so background scope refreshes use the existing background job infrastructure.ContentActivitiesin the background when a cold-miss scope refresh finds no applicable scopes, preserving the audit signal.Payment Required) responses from background scope refreshes so subsequent requests for the tenant short-circuit before repeating Purview work.ProcessContentreports modified protection-scope state.(action, restriction_action)in Python and .NET.good -> expected block -> goodflow across agent middleware, chat middleware, custom cache, and default cache scenarios.Validation
36 passed56 passedonnet10.056 passedonnet472py_compilepassedgood (cold cache) -> expected block -> good (warm cache)orchestration, including cache miss/hit behavior and Purview prompt blocking with the configuredPrompt blocked by policymessage.Contribution Checklist