docs: add policy-as-code tutorial (chapters 1-3)#808
docs: add policy-as-code tutorial (chapters 1-3)#808harinarayansrivatsan wants to merge 4 commits intomicrosoft:mainfrom
Conversation
Three-chapter tutorial teaching declarative policy governance for AI agents, progressing from basic allow/deny rules through role-based capability scoping to rate limiting with max_tool_calls and TokenBucket.
|
Welcome to the Agent Governance Toolkit! Thanks for your first pull request. |
🤖 AI Agent: contributor-guide — Welcome! 🎉Welcome! 🎉Hi there, and welcome to the microsoft/agent-governance-toolkit community! Thank you so much for your first contribution — we’re thrilled to have you here. 🎉 Your pull request looks like a fantastic addition, and I’m excited to dive into it with you. What You Did Well ✅
Suggestions for Improvement ✨
Project Conventions 📜Here are a few conventions we follow in this project:
Next Steps 🚀
Once you’ve made these updates, push your changes to this branch, and our CI/CD pipeline will automatically re-run. If you have any questions or need help, feel free to ask — we’re here to support you! Thank you again for your contribution. We’re excited to see this tutorial become part of the project and help others learn about policy-as-code for AI agents. 😊 |
There was a problem hiding this comment.
🤖 AI Agent: code-reviewer
Review Feedback for Pull Request: docs: add policy-as-code tutorial (chapters 1-3)
🔴 CRITICAL: Security Concerns
-
Policy Enforcement in
max_tool_callsImplementation:- The
max_tool_callslimit is described as metadata that the application should enforce, but this approach is prone to developer error and could lead to security bypass if the enforcement logic is improperly implemented or omitted. - Actionable Recommendation: Integrate
max_tool_callsenforcement directly into thePolicyEvaluatorclass. This ensures that the limit is always enforced, regardless of how the evaluator is used.
- The
-
Token Bucket Implementation:
- The
TokenBucketimplementation appears to be used for rate limiting, but there is no mention of thread safety. If multiple threads or processes are consuming tokens concurrently, race conditions could allow agents to bypass rate limits. - Actionable Recommendation: Ensure that the
TokenBucket.consume()andTokenBucket.time_until_available()methods are thread-safe. Use locks or atomic operations to prevent race conditions.
- The
-
Policy File Loading:
- The
PolicyEvaluator.load_policies()method is used to load YAML files, but there is no mention of validation against a schema or sanitization of the input. Malicious actors could craft policies that exploit vulnerabilities in the YAML parser or inject invalid rules. - Actionable Recommendation: Validate all loaded YAML files against the
PolicyDocumentschema using Pydantic or a similar library. Reject invalid or malformed policies with clear error messages.
- The
-
Default Policy Action:
- The default action in the YAML policy is set to
allow. This could lead to unintended security risks if a rule is missing or misconfigured. - Actionable Recommendation: Consider setting the default action to
denyin production environments and explicitly allow actions as needed. This aligns with the principle of least privilege.
- The default action in the YAML policy is set to
🟡 WARNING: Potential Breaking Changes
-
PolicyEvaluator API Changes:
- If the
max_tool_callsenforcement is moved into thePolicyEvaluatorclass, it will change the behavior of the API. This could break existing code that relies on the current implementation. - Actionable Recommendation: Clearly document the change and provide a migration guide for users. Consider introducing a feature flag to toggle the new behavior.
- If the
-
TokenBucket Thread Safety:
- Adding thread safety mechanisms to
TokenBucketmay introduce performance overhead. If existing users rely on the current implementation for high-throughput scenarios, this change could impact performance. - Actionable Recommendation: Benchmark the thread-safe implementation and provide guidance on when to use it versus alternatives.
- Adding thread safety mechanisms to
💡 Suggestions for Improvement
-
Policy Testing:
- The tutorial mentions policy testing in Chapter 6, but no testing framework or methodology is introduced in Chapters 1–3. Early introduction of testing practices would reinforce the importance of validating policies.
- Actionable Recommendation: Add a section in Chapter 1 that demonstrates how to write unit tests for policies using
pytest. Include examples of testing edge cases (e.g., missing fields, invalid operators).
-
Environment-Specific Policies:
- Chapter 3 briefly mentions conditional policies but does not provide any implementation details. This is a critical feature for production deployments.
- Actionable Recommendation: Add a preview of conditional policies in Chapter 3, showing how to switch between development and production policies based on environment variables.
-
Improved Documentation for Rate Limiting:
- The
TokenBucketexample is clear but could benefit from additional context, such as how to handle bursty workloads or integrate with external APIs. - Actionable Recommendation: Expand the example to include a real-world use case, such as rate-limiting API calls to an external service.
- The
-
Backward Compatibility:
- The tutorials introduce new features and patterns that may not be compatible with older versions of the library. Ensure backward compatibility is maintained.
- Actionable Recommendation: Add a note in the README specifying the minimum version of the library required to follow the tutorials.
-
Thread Safety in PolicyEvaluator:
- The
PolicyEvaluatorclass is used in examples but does not mention thread safety. If multiple agents are evaluated concurrently, race conditions could occur. - Actionable Recommendation: Document whether
PolicyEvaluatoris thread-safe. If not, provide guidance on how to use it safely in multi-threaded applications.
- The
Summary
This pull request provides a well-structured and informative tutorial on policy-as-code for AI agents. However, several critical security concerns need to be addressed to ensure the robustness and safety of the implementation. Additionally, there are potential breaking changes that require careful documentation and migration strategies. Finally, there are opportunities to improve the tutorial by introducing testing practices, expanding on rate-limiting use cases, and addressing thread safety concerns.
Action Items:
- Implement
max_tool_callsenforcement directly inPolicyEvaluator. - Ensure thread safety in
TokenBucket. - Validate YAML policies against the
PolicyDocumentschema. - Consider setting the default policy action to
denyin production. - Document and mitigate potential breaking changes.
- Add unit testing examples for policies in Chapter 1.
- Preview conditional policies in Chapter 3.
- Expand rate-limiting examples with real-world use cases.
- Specify minimum library version for tutorials.
- Address thread safety in
PolicyEvaluator.
Let me know if you need further clarification or assistance with any of the feedback points!
🤖 AI Agent: security-scanner — Security Review of PR: `docs: add policy-as-code tutorial (chapters 1-3)`Security Review of PR:
|
| Finding | Rating | Recommendation |
|---|---|---|
| Prompt injection defense bypass | 🟠 HIGH | Add input validation for context and emphasize its importance in the tutorial. |
| Policy engine circumvention | 🟠 HIGH | Emphasize integration of PolicyEvaluator into the core execution flow. |
| Trust chain weaknesses | 🟠 HIGH | Verify the integrity of YAML policy files using digital signatures or checksums. |
| Credential exposure | 🔵 LOW | No immediate action required. |
| Sandbox escape | 🔵 LOW | No immediate action required. |
| Deserialization attacks | 🟠 HIGH | Ensure safe YAML parsing and validate YAML files before loading. |
| Race conditions | 🟡 MEDIUM | Ensure thread safety in PolicyEvaluator and TokenBucket. |
| Supply chain | 🟠 HIGH | Verify the integrity of agent-os-kernel and its dependencies. |
General Recommendations
- Documentation Updates: Update the tutorial to include sections on input validation, policy file integrity, and concurrency considerations.
- Code Enhancements: Ensure that the
PolicyEvaluatorandTokenBucketare thread-safe and thatPolicyDocument.from_yamluses safe YAML parsing methods. - Supply Chain Security: Provide guidance on verifying the integrity of dependencies and consider integrating automated dependency scanning tools into the CI/CD pipeline.
Let me know if you need further clarification or assistance!
|
@microsoft-github-policy-service agree |
There was a problem hiding this comment.
Pull request overview
Adds a new documentation tutorial series under docs/tutorials/policy-as-code/ introducing YAML-based governance policies for agents (allow/deny, role-based capability scoping, and rate limiting) with runnable Python examples.
Changes:
- Adds a top-level tutorial README with installation and chapter index.
- Adds Chapters 1–3 as markdown guides with embedded YAML/Python snippets.
- Adds runnable Python example scripts and corresponding YAML policy files for each chapter.
Reviewed changes
Copilot reviewed 11 out of 11 changed files in this pull request and generated 10 comments.
Show a summary per file
| File | Description |
|---|---|
| docs/tutorials/policy-as-code/README.md | Tutorial landing page: install instructions, chapter index, and related links |
| docs/tutorials/policy-as-code/01-your-first-policy.md | Chapter 1 narrative + examples for basic allow/deny |
| docs/tutorials/policy-as-code/02-capability-scoping.md | Chapter 2 narrative + examples for per-role policies |
| docs/tutorials/policy-as-code/03-rate-limiting.md | Chapter 3 narrative + examples for max_tool_calls + token bucket rate limiting |
| docs/tutorials/policy-as-code/examples/01_first_policy.py | Runnable Chapter 1 Python example |
| docs/tutorials/policy-as-code/examples/01_first_policy.yaml | Chapter 1 policy YAML |
| docs/tutorials/policy-as-code/examples/02_capability_scoping.py | Runnable Chapter 2 Python example |
| docs/tutorials/policy-as-code/examples/02_reader_policy.yaml | Reader role policy YAML |
| docs/tutorials/policy-as-code/examples/02_admin_policy.yaml | Admin role policy YAML |
| docs/tutorials/policy-as-code/examples/03_rate_limiting.py | Runnable Chapter 3 Python example |
| docs/tutorials/policy-as-code/examples/03_rate_limit_policy.yaml | Chapter 3 policy YAML |
- Load only Ch1 policy in 01_first_policy.py (was loading all YAMLs) - Fix filename mismatch in Ch1 markdown (my-first-policy → 01_first_policy) - Replace broken link to Ch4 with "coming soon" text - Fix Python version prerequisite (3.10+ → 3.9+)
There was a problem hiding this comment.
🤖 AI Agent: code-reviewer
Review Summary
This pull request introduces a three-chapter tutorial for teaching declarative policy governance for AI agents. The tutorial is well-structured and provides clear examples for YAML-based policy creation, role-based capability scoping, and rate limiting. While the content is educational and useful, there are several areas that require attention to ensure correctness, security, and maintainability.
🔴 CRITICAL: Security Issues
-
Policy Evaluation Trust Boundary
- The tutorial does not address how to validate the integrity of the YAML policy files. If an attacker modifies the policy files, they could bypass security controls or introduce malicious rules.
- Actionable Fix: Implement cryptographic signing and verification for policy files. Use a secure mechanism (e.g., HMAC or digital signatures) to ensure that only authorized policies are loaded into the
PolicyEvaluator.
-
Rate Limiting Implementation
- The
max_tool_callslimit is described as metadata that must be enforced externally. This creates a risk of inconsistent enforcement across different parts of the application. - Actionable Fix: Integrate
max_tool_callsenforcement directly into thePolicyEvaluatorclass. This ensures that rate limiting is applied uniformly and cannot be bypassed.
- The
-
TokenBucket Implementation
- The
TokenBucketexample does not account for thread safety. If multiple threads or processes access the same bucket, race conditions could allow agents to exceed rate limits. - Actionable Fix: Use thread-safe mechanisms (e.g.,
threading.Lockor atomic operations) to protect theTokenBucketstate during concurrent access.
- The
🟡 WARNING: Potential Breaking Changes
-
Backward Compatibility of Policy Schema
- The tutorial introduces YAML policies with new fields like
max_tool_calls. If existing users rely on older schemas, this could break their workflows. - Actionable Fix: Ensure backward compatibility by versioning the schema. Add validation logic to detect unsupported fields and provide clear error messages.
- The tutorial introduces YAML policies with new fields like
-
Default Behavior Changes
- The tutorial suggests changing the default action from
allowtodenyin exercises. This could lead to unexpected behavior in production if users adopt these changes without understanding the implications. - Actionable Fix: Clearly document the risks of changing default actions and recommend testing in isolated environments before deployment.
- The tutorial suggests changing the default action from
💡 Suggestions for Improvement
-
Type Safety and Validation
- The
PolicyDocument.from_yamlmethod should validate the YAML structure against the schema using Pydantic models. This ensures type safety and prevents runtime errors due to malformed policies. - Actionable Fix: Add Pydantic-based validation to
PolicyDocument.from_yaml.
- The
-
OWASP Agentic Top 10 Compliance
- The tutorial does not address sandboxing or preventing agents from escaping their execution environment.
- Actionable Fix: Add a section on sandboxing techniques (e.g., containerization, syscall filtering) to ensure agents cannot perform unauthorized actions outside their scope.
-
Thread Safety in PolicyEvaluator
- The
PolicyEvaluatorclass does not appear to be thread-safe. If multiple agents evaluate policies concurrently, race conditions could occur. - Actionable Fix: Use thread-safe data structures (e.g.,
threading.Lock) for managing thepolicieslist.
- The
-
Testing Coverage
- The tutorial examples are useful but lack automated tests to verify correctness. This could lead to regressions in future updates.
- Actionable Fix: Add pytest-based unit tests for each example script, including edge cases like malformed YAML files and conflicting policies.
-
Environment-Specific Policies
- The tutorial mentions conditional policies in a future chapter but does not provide any implementation details.
- Actionable Fix: Add a placeholder section with a brief explanation of how environment-specific policies could be implemented (e.g., using environment variables or configuration files).
-
Documentation Enhancements
- The tutorial is comprehensive but could benefit from diagrams illustrating policy evaluation flow and rate limiting mechanisms.
- Actionable Fix: Add visual aids (e.g., flowcharts, sequence diagrams) to the documentation.
Final Recommendations
- Address the 🔴 CRITICAL security issues immediately, as they could lead to policy bypass or inconsistent enforcement.
- Ensure backward compatibility and document potential breaking changes clearly.
- Implement the 💡 suggestions to improve type safety, thread safety, and compliance with security best practices.
Once these issues are resolved, the tutorial will be a valuable resource for users looking to implement policy-as-code for AI agent governance.
imran-siddique
left a comment
There was a problem hiding this comment.
Well-structured tutorial with runnable examples. Two suggestions:
- Change 'Closes #706' to 'Ref #706' in the PR description since this only delivers chapters 1-3 of 7
- Add a brief comment in Chapter 3 explaining the PolicyEvaluator constructor pattern change
Approved - please resolve the automated review threads so this can merge.
There was a problem hiding this comment.
🤖 AI Agent: code-reviewer
Review Summary
This pull request introduces a tutorial for policy-as-code, covering chapters 1–3. The tutorial is well-structured and provides clear examples for understanding policy governance concepts such as allow/deny rules, capability scoping, and rate limiting. The documentation is detailed and includes runnable Python examples, YAML policy files, and exercises for hands-on learning. However, there are some areas that require attention to ensure security, correctness, and maintainability.
🔴 CRITICAL: Security Concerns
-
Policy Evaluation Trust Boundary
- The
PolicyEvaluatorappears to evaluate agent actions based on the provided context and policy. However, the tutorial does not mention how thecontextdictionary is validated or sanitized before evaluation. If thecontextis user-controlled, it could lead to security vulnerabilities such as injection attacks or bypassing policy rules. - Actionable Recommendation: Ensure that the
contextdictionary is validated against a strict schema using Pydantic models or similar validation mechanisms to prevent malicious input.
- The
-
Rate Limiting Enforcement
- The
max_tool_callslimit is described as metadata that the application must enforce. This introduces a risk of inconsistent enforcement across different parts of the application, potentially allowing agents to bypass rate limits. - Actionable Recommendation: Integrate rate-limiting enforcement directly into the
PolicyEvaluatoror provide a wrapper that ensures consistent enforcement of rate limits across all agent actions.
- The
-
Role-Based Policy Loading
- The tutorial mentions loading policies based on agent roles but does not address how roles are securely assigned or verified. If role assignment is compromised, an agent could gain unauthorized permissions.
- Actionable Recommendation: Use SPIFFE/SVID or other cryptographic identity mechanisms to securely associate agents with roles and ensure that policy files are loaded based on verified identities.
🟡 WARNING: Potential Breaking Changes
- Backward Compatibility of Policy Schema
- The tutorial introduces YAML-based policies with specific fields (
version,name,description,rules,defaults). If future changes are made to the schema (e.g., adding new fields or modifying existing ones), it could break existing policies. - Actionable Recommendation: Implement schema versioning and validation using Pydantic models. Ensure backward compatibility by supporting older versions of the schema.
- The tutorial introduces YAML-based policies with specific fields (
💡 Suggestions for Improvement
-
Thread Safety in Policy Evaluation
- The
PolicyEvaluatorappends policies to itspoliciesattribute, which could lead to race conditions in concurrent agent execution scenarios. - Actionable Recommendation: Ensure thread safety by using locks or thread-safe data structures for managing policies in
PolicyEvaluator.
- The
-
Sandbox Escape Vectors
- The tutorial does not address potential sandbox escape vectors, such as agents executing arbitrary code or accessing unauthorized resources.
- Actionable Recommendation: Include a section in the tutorial that discusses best practices for sandboxing agents and preventing escape vectors.
-
Type Safety
- The
PolicyEvaluatorandPolicyDocumentclasses should enforce strict type safety for inputs and outputs. For example, theevaluatemethod should ensure that thecontextdictionary conforms to a predefined schema. - Actionable Recommendation: Use Pydantic models for type validation and enforce strict typing in all public methods.
- The
-
Policy Testing
- While the tutorial mentions policy testing in future chapters, it would be beneficial to include basic testing strategies in the current chapters. For example, testing edge cases like overlapping rules, empty policies, or invalid YAML files.
- Actionable Recommendation: Add a section on testing policies in each chapter, including examples of how to write unit tests for policies using
pytest.
-
Error Handling
- The tutorial does not address error handling for scenarios like missing or invalid YAML files, malformed policies, or exceptions during evaluation.
- Actionable Recommendation: Add error handling examples to the tutorial, including how to catch and handle exceptions raised by
PolicyEvaluatororPolicyDocument.
-
Documentation Improvements
- The tutorial is well-written but could benefit from additional details on how to integrate policies into real-world applications. For example, how to dynamically load policies based on environment variables or external configuration.
- Actionable Recommendation: Add a section on "Integrating Policies into Applications" with code snippets and best practices.
Summary of Actions
🔴 CRITICAL
- Validate and sanitize the
contextdictionary before policy evaluation. - Enforce
max_tool_callsdirectly within thePolicyEvaluatoror provide a consistent enforcement mechanism. - Implement secure role assignment and verification using cryptographic identity mechanisms like SPIFFE/SVID.
🟡 WARNING
- Add schema versioning and validation to ensure backward compatibility for policy files.
💡 SUGGESTION
- Ensure thread safety in
PolicyEvaluatorfor concurrent agent execution. - Address sandbox escape vectors in the tutorial.
- Enforce type safety using Pydantic models for inputs and outputs.
- Include basic policy testing strategies in the tutorial.
- Add error handling examples for common issues like invalid YAML files or malformed policies.
- Expand documentation to cover real-world integration of policies into applications.
Next Steps
Please address the critical security concerns as a priority. Once resolved, consider implementing the suggestions to improve the robustness and usability of the tutorial and the underlying policy engine.
Thanks Imran, I've added the constructor comment in Chapter 3 and updated the PR description. |
There was a problem hiding this comment.
🤖 AI Agent: code-reviewer
Review Summary
This pull request introduces a tutorial for the policy-as-code feature of the microsoft/agent-governance-toolkit. The tutorial spans three chapters, covering basic allow/deny rules, capability scoping, and rate limiting. It includes YAML policy files, Python examples, and detailed documentation. While the tutorial is well-written and provides clear examples, there are several areas that require attention to ensure security, correctness, and alignment with the repository's focus areas.
🔴 CRITICAL: Security Issues
-
Policy Evaluation Trust Boundary
- The
PolicyEvaluatordirectly evaluates YAML-based policies without any validation of the input source. This could lead to policy injection attacks if the YAML file is tampered with or maliciously crafted. - Actionable Fix: Implement strict validation of the YAML input before loading it into
PolicyDocument. Ensure that the YAML file is signed and verify its integrity using cryptographic methods (e.g., HMAC or digital signatures). This is especially important for production environments where policies may be loaded dynamically.
- The
-
Rate Limiting Enforcement
- The tutorial mentions that
max_tool_callsis "metadata" and does not enforce limits automatically. This creates a risk of rate-limiting bypass if developers forget to implement the enforcement logic. - Actionable Fix: Update the
PolicyEvaluatorto enforcemax_tool_callsdirectly during evaluation. This ensures that rate limits are applied consistently and reduces the risk of developer error.
- The tutorial mentions that
-
Token Bucket Implementation
- The
TokenBucketimplementation appears to rely on the client application to enforce rate limits. This could lead to race conditions in concurrent agent execution scenarios, where multiple threads or processes consume tokens simultaneously. - Actionable Fix: Ensure that the
TokenBucketimplementation is thread-safe. Use synchronization primitives like locks or semaphores to prevent race conditions during token consumption and refill.
- The
🟡 WARNING: Potential Breaking Changes
- Backward Compatibility of Policy Schema
- The tutorial introduces new fields (
max_tool_calls) in the policy schema. If these fields are not backward-compatible with existing policies, it could break applications relying on older versions of the library. - Actionable Fix: Ensure that the
PolicyDocumentmodel can handle policies without themax_tool_callsfield gracefully. Add unit tests to verify backward compatibility.
- The tutorial introduces new fields (
💡 Suggestions for Improvement
-
Type Safety
- The
PolicyDocument.from_yaml()method does not appear to enforce strict type validation for the YAML input. This could lead to runtime errors if the YAML file contains unexpected data types. - Actionable Fix: Use Pydantic models to validate the structure and types of the YAML input before loading it into the
PolicyDocument.
- The
-
OWASP Agentic Top 10 Compliance
- The tutorial does not address potential sandbox escape vectors, such as an agent executing arbitrary code via tools like
delete_databaseorwrite_file. - Actionable Fix: Add a section to the tutorial emphasizing the importance of sandboxing and validating agent actions to prevent unauthorized code execution.
- The tutorial does not address potential sandbox escape vectors, such as an agent executing arbitrary code via tools like
-
Documentation Improvements
- While the tutorial is comprehensive, it could benefit from additional emphasis on real-world security considerations. For example:
- How to securely store and manage policy files.
- How to handle policy updates in production environments.
- How to integrate policy evaluation with SPIFFE/SVID for agent identity verification.
- Actionable Fix: Add a "Security Best Practices" section to the tutorial.
- While the tutorial is comprehensive, it could benefit from additional emphasis on real-world security considerations. For example:
-
Testing Coverage
- The tutorial does not include automated tests for the example scripts or the YAML policies.
- Actionable Fix: Add pytest-based tests to validate the correctness of the example scripts and ensure that the YAML policies produce the expected results when evaluated.
-
Thread Safety
- The tutorial does not address thread safety in concurrent agent execution scenarios. This is particularly important for the
PolicyEvaluatorandTokenBucketcomponents. - Actionable Fix: Add a section to the tutorial discussing how to use these components in multithreaded or multiprocessing environments. Include examples of using locks or other synchronization mechanisms.
- The tutorial does not address thread safety in concurrent agent execution scenarios. This is particularly important for the
-
Policy Conflict Resolution
- The tutorial mentions that rules are checked in priority order, but it does not explain how conflicts between rules are resolved (e.g., two rules with the same priority but conflicting actions).
- Actionable Fix: Clarify the conflict resolution strategy in the documentation and ensure the
PolicyEvaluatorhandles such cases deterministically.
Additional Notes
- The tutorial is well-structured and provides clear, step-by-step instructions. It is a good starting point for users new to policy-as-code.
- The examples are runnable and provide immediate feedback, which is excellent for learning.
- The inclusion of YAML snippets and their breakdown is helpful for understanding the policy schema.
Summary of Actions
Security
- 🔴 Validate YAML input and enforce integrity checks (e.g., HMAC or digital signatures).
- 🔴 Enforce
max_tool_callsdirectly inPolicyEvaluator. - 🔴 Ensure
TokenBucketimplementation is thread-safe.
Backward Compatibility
- 🟡 Ensure backward compatibility for policies without
max_tool_calls.
Improvements
- 💡 Use Pydantic for type validation of YAML input.
- 💡 Add a "Security Best Practices" section to the tutorial.
- 💡 Add automated tests for example scripts and YAML policies.
- 💡 Discuss thread safety and provide examples for concurrent execution.
- 💡 Clarify policy conflict resolution in the documentation.
Let me know if you need further clarification or assistance!
Summary
Test plan
python docs/tutorials/policy-as-code/examples/01_first_policy.pypython docs/tutorials/policy-as-code/examples/02_capability_scoping.pypython docs/tutorials/policy-as-code/examples/03_rate_limiting.pyRef #706