Skip to content

docs: add policy-as-code tutorial (chapters 1-3)#808

Open
harinarayansrivatsan wants to merge 4 commits intomicrosoft:mainfrom
harinarayansrivatsan:docs/706-policy-as-code-tutorial
Open

docs: add policy-as-code tutorial (chapters 1-3)#808
harinarayansrivatsan wants to merge 4 commits intomicrosoft:mainfrom
harinarayansrivatsan:docs/706-policy-as-code-tutorial

Conversation

@harinarayansrivatsan
Copy link
Copy Markdown

@harinarayansrivatsan harinarayansrivatsan commented Apr 5, 2026

Summary

  • Three-chapter tutorial teaching declarative policy governance for AI agents
  • Chapter 1: allow/deny basics with YAML policies and PolicyEvaluator
  • Chapter 2: capability scoping with per-role policy files (reader vs admin)
  • Chapter 3: rate limiting with max_tool_calls and TokenBucket
  • Each chapter includes runnable Python examples and YAML policy files

Test plan

  • Run python docs/tutorials/policy-as-code/examples/01_first_policy.py
  • Run python docs/tutorials/policy-as-code/examples/02_capability_scoping.py
  • Run python docs/tutorials/policy-as-code/examples/03_rate_limiting.py
  • Verify all links between chapters work
  • Verify YAML snippets in markdown match the example files

Ref #706

Three-chapter tutorial teaching declarative policy governance for AI
agents, progressing from basic allow/deny rules through role-based
capability scoping to rate limiting with max_tool_calls and TokenBucket.
Copilot AI review requested due to automatic review settings April 5, 2026 14:06
@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 5, 2026

Welcome to the Agent Governance Toolkit! Thanks for your first pull request.
Please ensure tests pass, code follows style (ruff check), and you have signed the CLA.
See our Contributing Guide.

@github-actions github-actions bot added the size/XL Extra large PR (500+ lines) label Apr 5, 2026
@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 5, 2026

🤖 AI Agent: contributor-guide — Welcome! 🎉

Welcome! 🎉

Hi there, and welcome to the microsoft/agent-governance-toolkit community! Thank you so much for your first contribution — we’re thrilled to have you here. 🎉 Your pull request looks like a fantastic addition, and I’m excited to dive into it with you.


What You Did Well ✅

  1. Comprehensive Documentation: The tutorial is incredibly detailed and well-structured. Breaking it into chapters with clear learning objectives and exercises makes it accessible for both beginners and advanced users. Great job!
  2. Runnable Examples: Including Python scripts for each chapter is a fantastic touch. It makes the tutorial practical and easy to follow.
  3. Clear Test Plan: Your test plan is thorough and provides clear steps for reviewers to verify the functionality.
  4. Commit Message: You followed the Conventional Commits standard with docs: add policy-as-code tutorial (chapters 1-3). This is great for maintaining a clean commit history.

Suggestions for Improvement ✨

  1. File Placement for Tests:

    • In this project, we place tests in the packages/{name}/tests/ directory. While your examples are excellent for documentation purposes, it would be great to also include automated tests in the appropriate tests/ directory. This ensures that the examples are continuously tested as part of our CI pipeline.
    • Consider moving the test scripts to packages/agent-os/tests/ and integrating them into the test suite. You can still reference them in the documentation.
  2. Linting:

    • We use ruff for linting with specific rules (E, F, W). It would be great if you could run ruff locally to ensure your code adheres to the style guide. You can install it with pip install ruff and run it using ruff check ..
  3. Security Considerations:

    • Since this project deals with governance and policies, security is a top priority. While your examples look great, it’s important to ensure that any code interacting with policies (e.g., loading YAML files) is robust against potential security issues like YAML deserialization attacks. Consider using safe_load instead of load when parsing YAML files.
  4. Linking to Related Resources:

    • You’ve done a great job linking to related resources in your documentation. However, I’d recommend adding a link to our CONTRIBUTING.md file in the README.md of your tutorial folder. This will help guide users who want to contribute further.

Project Conventions 📜

Here are a few conventions we follow in this project:

  1. Linting: We use ruff for linting with rules E, F, and W. Please run ruff check . locally and fix any issues.
  2. Testing: Tests should be placed in packages/{name}/tests/. This ensures they are automatically picked up by our CI pipeline.
  3. Commit Messages: We follow the Conventional Commits standard. Your commit message is already great!
  4. Security: Since this project involves security-sensitive code, we give extra scrutiny to any code that handles policy evaluation, file parsing, or user input.

Next Steps 🚀

  1. Run Linting: Please run ruff check . locally and address any issues.
  2. Add Tests: Consider moving your example scripts to the packages/agent-os/tests/ directory and integrating them into the test suite.
  3. Security Review: Update YAML parsing to use safe_load for added security.
  4. Update Documentation: Add a link to CONTRIBUTING.md in your tutorial’s README.md.

Once you’ve made these updates, push your changes to this branch, and our CI/CD pipeline will automatically re-run. If you have any questions or need help, feel free to ask — we’re here to support you!


Thank you again for your contribution. We’re excited to see this tutorial become part of the project and help others learn about policy-as-code for AI agents. 😊

Copy link
Copy Markdown

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 AI Agent: code-reviewer

Review Feedback for Pull Request: docs: add policy-as-code tutorial (chapters 1-3)


🔴 CRITICAL: Security Concerns

  1. Policy Enforcement in max_tool_calls Implementation:

    • The max_tool_calls limit is described as metadata that the application should enforce, but this approach is prone to developer error and could lead to security bypass if the enforcement logic is improperly implemented or omitted.
    • Actionable Recommendation: Integrate max_tool_calls enforcement directly into the PolicyEvaluator class. This ensures that the limit is always enforced, regardless of how the evaluator is used.
  2. Token Bucket Implementation:

    • The TokenBucket implementation appears to be used for rate limiting, but there is no mention of thread safety. If multiple threads or processes are consuming tokens concurrently, race conditions could allow agents to bypass rate limits.
    • Actionable Recommendation: Ensure that the TokenBucket.consume() and TokenBucket.time_until_available() methods are thread-safe. Use locks or atomic operations to prevent race conditions.
  3. Policy File Loading:

    • The PolicyEvaluator.load_policies() method is used to load YAML files, but there is no mention of validation against a schema or sanitization of the input. Malicious actors could craft policies that exploit vulnerabilities in the YAML parser or inject invalid rules.
    • Actionable Recommendation: Validate all loaded YAML files against the PolicyDocument schema using Pydantic or a similar library. Reject invalid or malformed policies with clear error messages.
  4. Default Policy Action:

    • The default action in the YAML policy is set to allow. This could lead to unintended security risks if a rule is missing or misconfigured.
    • Actionable Recommendation: Consider setting the default action to deny in production environments and explicitly allow actions as needed. This aligns with the principle of least privilege.

🟡 WARNING: Potential Breaking Changes

  1. PolicyEvaluator API Changes:

    • If the max_tool_calls enforcement is moved into the PolicyEvaluator class, it will change the behavior of the API. This could break existing code that relies on the current implementation.
    • Actionable Recommendation: Clearly document the change and provide a migration guide for users. Consider introducing a feature flag to toggle the new behavior.
  2. TokenBucket Thread Safety:

    • Adding thread safety mechanisms to TokenBucket may introduce performance overhead. If existing users rely on the current implementation for high-throughput scenarios, this change could impact performance.
    • Actionable Recommendation: Benchmark the thread-safe implementation and provide guidance on when to use it versus alternatives.

💡 Suggestions for Improvement

  1. Policy Testing:

    • The tutorial mentions policy testing in Chapter 6, but no testing framework or methodology is introduced in Chapters 1–3. Early introduction of testing practices would reinforce the importance of validating policies.
    • Actionable Recommendation: Add a section in Chapter 1 that demonstrates how to write unit tests for policies using pytest. Include examples of testing edge cases (e.g., missing fields, invalid operators).
  2. Environment-Specific Policies:

    • Chapter 3 briefly mentions conditional policies but does not provide any implementation details. This is a critical feature for production deployments.
    • Actionable Recommendation: Add a preview of conditional policies in Chapter 3, showing how to switch between development and production policies based on environment variables.
  3. Improved Documentation for Rate Limiting:

    • The TokenBucket example is clear but could benefit from additional context, such as how to handle bursty workloads or integrate with external APIs.
    • Actionable Recommendation: Expand the example to include a real-world use case, such as rate-limiting API calls to an external service.
  4. Backward Compatibility:

    • The tutorials introduce new features and patterns that may not be compatible with older versions of the library. Ensure backward compatibility is maintained.
    • Actionable Recommendation: Add a note in the README specifying the minimum version of the library required to follow the tutorials.
  5. Thread Safety in PolicyEvaluator:

    • The PolicyEvaluator class is used in examples but does not mention thread safety. If multiple agents are evaluated concurrently, race conditions could occur.
    • Actionable Recommendation: Document whether PolicyEvaluator is thread-safe. If not, provide guidance on how to use it safely in multi-threaded applications.

Summary

This pull request provides a well-structured and informative tutorial on policy-as-code for AI agents. However, several critical security concerns need to be addressed to ensure the robustness and safety of the implementation. Additionally, there are potential breaking changes that require careful documentation and migration strategies. Finally, there are opportunities to improve the tutorial by introducing testing practices, expanding on rate-limiting use cases, and addressing thread safety concerns.

Action Items:

  1. Implement max_tool_calls enforcement directly in PolicyEvaluator.
  2. Ensure thread safety in TokenBucket.
  3. Validate YAML policies against the PolicyDocument schema.
  4. Consider setting the default policy action to deny in production.
  5. Document and mitigate potential breaking changes.
  6. Add unit testing examples for policies in Chapter 1.
  7. Preview conditional policies in Chapter 3.
  8. Expand rate-limiting examples with real-world use cases.
  9. Specify minimum library version for tutorials.
  10. Address thread safety in PolicyEvaluator.

Let me know if you need further clarification or assistance with any of the feedback points!

@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 5, 2026

🤖 AI Agent: security-scanner — Security Review of PR: `docs: add policy-as-code tutorial (chapters 1-3)`

Security Review of PR: docs: add policy-as-code tutorial (chapters 1-3)

This PR introduces documentation and examples for a policy-as-code tutorial, covering basic allow/deny rules, capability scoping, and rate limiting. While the changes are primarily documentation and examples, they interact with the core policy evaluation engine. Given the critical nature of this repository, I have reviewed the changes for potential security issues.


Findings

1. Prompt Injection Defense Bypass

  • Risk: The examples demonstrate how to use the PolicyEvaluator to evaluate agent actions based on a YAML policy. However, there is no mention of sanitizing or validating the context input (e.g., tool_name). If an attacker can craft a malicious context input, they may bypass policy checks by injecting unexpected values or exploiting edge cases in the policy evaluation logic.
  • Rating: 🟠 HIGH
  • Attack Vector: An attacker could craft a context dictionary with unexpected or malformed data (e.g., using special characters, escape sequences, or unexpected data types) to bypass policy rules or cause undefined behavior in the policy engine.
  • Recommendation:
    • Add explicit validation for the context input before passing it to the evaluate method. For example, ensure that tool_name is a string and matches a predefined set of allowed values.
    • Update the tutorial to include a section on input validation and emphasize its importance in preventing injection attacks.

2. Policy Engine Circumvention

  • Risk: The examples rely on the PolicyEvaluator to enforce policies, but there is no discussion of how to prevent unauthorized agents from bypassing the policy engine entirely. For example, an agent could directly invoke tools without going through the policy evaluation layer.
  • Rating: 🟠 HIGH
  • Attack Vector: If an attacker can bypass the PolicyEvaluator (e.g., by directly invoking tools or modifying the agent's behavior), they could perform actions that violate the intended policies.
  • Recommendation:
    • Include a section in the tutorial emphasizing the importance of integrating the PolicyEvaluator into the core execution flow of the agent, ensuring that all actions are evaluated against the policy.
    • Consider implementing a mechanism to enforce that all tool invocations are routed through the PolicyEvaluator, such as by using decorators or middleware.

3. Trust Chain Weaknesses

  • Risk: The examples do not address how to verify the integrity of the YAML policy files. If an attacker can tamper with the policy files, they could weaken or bypass security controls.
  • Rating: 🟠 HIGH
  • Attack Vector: An attacker with access to the file system could modify the YAML policy files to allow unauthorized actions or remove restrictions.
  • Recommendation:
    • Add a section in the tutorial on verifying the integrity of policy files, such as by using digital signatures or checksums.
    • Consider implementing a mechanism in the PolicyEvaluator to verify the authenticity and integrity of policy files before loading them.

4. Credential Exposure

  • Risk: The examples do not include any sensitive information, and there is no evidence of credentials being exposed in logs or error messages.
  • Rating: 🔵 LOW
  • Recommendation: No immediate action is required, but ensure that future examples and documentation do not include sensitive information.

5. Sandbox Escape

  • Risk: The examples do not involve sandboxing or process isolation, so there is no immediate risk of a sandbox escape.
  • Rating: 🔵 LOW
  • Recommendation: No immediate action is required, but ensure that future examples and documentation address sandboxing if relevant.

6. Deserialization Attacks

  • Risk: The examples use PolicyDocument.from_yaml to load YAML files, but there is no mention of validating the YAML content. If the YAML parser used by PolicyDocument.from_yaml is vulnerable to deserialization attacks, this could lead to arbitrary code execution.
  • Rating: 🟠 HIGH
  • Attack Vector: An attacker could craft a malicious YAML file containing arbitrary code or unexpected data structures, which could be executed or mishandled by the YAML parser.
  • Recommendation:
    • Ensure that the YAML parser used by PolicyDocument.from_yaml is safe from deserialization attacks (e.g., use safe_load instead of load in PyYAML).
    • Add a note in the tutorial about the importance of validating and sanitizing YAML files before loading them.

7. Race Conditions

  • Risk: The examples do not demonstrate concurrent access to the PolicyEvaluator or TokenBucket. However, if these components are used in a multithreaded or distributed environment, there could be race conditions in policy evaluation or rate limiting.
  • Rating: 🟡 MEDIUM
  • Attack Vector: In a multithreaded or distributed system, multiple threads or processes could simultaneously evaluate policies or consume tokens, leading to inconsistent or incorrect enforcement of policies.
  • Recommendation:
    • Add a note in the tutorial about thread safety and concurrency considerations when using PolicyEvaluator and TokenBucket.
    • Ensure that PolicyEvaluator and TokenBucket are thread-safe or provide guidance on how to use them safely in multithreaded environments.

8. Supply Chain

  • Risk: The tutorial instructs users to install the agent-os-kernel package, but there is no mention of verifying the integrity of the package or its dependencies.
  • Rating: 🟠 HIGH
  • Attack Vector: An attacker could exploit dependency confusion or typosquatting to inject malicious code into the agent-os-kernel package or its dependencies.
  • Recommendation:
    • Add a section in the tutorial on verifying the integrity of the agent-os-kernel package, such as by checking its hash or signature.
    • Consider using tools like pip-audit to scan for vulnerabilities in the package and its dependencies.

Summary of Findings

Finding Rating Recommendation
Prompt injection defense bypass 🟠 HIGH Add input validation for context and emphasize its importance in the tutorial.
Policy engine circumvention 🟠 HIGH Emphasize integration of PolicyEvaluator into the core execution flow.
Trust chain weaknesses 🟠 HIGH Verify the integrity of YAML policy files using digital signatures or checksums.
Credential exposure 🔵 LOW No immediate action required.
Sandbox escape 🔵 LOW No immediate action required.
Deserialization attacks 🟠 HIGH Ensure safe YAML parsing and validate YAML files before loading.
Race conditions 🟡 MEDIUM Ensure thread safety in PolicyEvaluator and TokenBucket.
Supply chain 🟠 HIGH Verify the integrity of agent-os-kernel and its dependencies.

General Recommendations

  1. Documentation Updates: Update the tutorial to include sections on input validation, policy file integrity, and concurrency considerations.
  2. Code Enhancements: Ensure that the PolicyEvaluator and TokenBucket are thread-safe and that PolicyDocument.from_yaml uses safe YAML parsing methods.
  3. Supply Chain Security: Provide guidance on verifying the integrity of dependencies and consider integrating automated dependency scanning tools into the CI/CD pipeline.

Let me know if you need further clarification or assistance!

@harinarayansrivatsan
Copy link
Copy Markdown
Author

@microsoft-github-policy-service agree

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new documentation tutorial series under docs/tutorials/policy-as-code/ introducing YAML-based governance policies for agents (allow/deny, role-based capability scoping, and rate limiting) with runnable Python examples.

Changes:

  • Adds a top-level tutorial README with installation and chapter index.
  • Adds Chapters 1–3 as markdown guides with embedded YAML/Python snippets.
  • Adds runnable Python example scripts and corresponding YAML policy files for each chapter.

Reviewed changes

Copilot reviewed 11 out of 11 changed files in this pull request and generated 10 comments.

Show a summary per file
File Description
docs/tutorials/policy-as-code/README.md Tutorial landing page: install instructions, chapter index, and related links
docs/tutorials/policy-as-code/01-your-first-policy.md Chapter 1 narrative + examples for basic allow/deny
docs/tutorials/policy-as-code/02-capability-scoping.md Chapter 2 narrative + examples for per-role policies
docs/tutorials/policy-as-code/03-rate-limiting.md Chapter 3 narrative + examples for max_tool_calls + token bucket rate limiting
docs/tutorials/policy-as-code/examples/01_first_policy.py Runnable Chapter 1 Python example
docs/tutorials/policy-as-code/examples/01_first_policy.yaml Chapter 1 policy YAML
docs/tutorials/policy-as-code/examples/02_capability_scoping.py Runnable Chapter 2 Python example
docs/tutorials/policy-as-code/examples/02_reader_policy.yaml Reader role policy YAML
docs/tutorials/policy-as-code/examples/02_admin_policy.yaml Admin role policy YAML
docs/tutorials/policy-as-code/examples/03_rate_limiting.py Runnable Chapter 3 Python example
docs/tutorials/policy-as-code/examples/03_rate_limit_policy.yaml Chapter 3 policy YAML

- Load only Ch1 policy in 01_first_policy.py (was loading all YAMLs)
- Fix filename mismatch in Ch1 markdown (my-first-policy → 01_first_policy)
- Replace broken link to Ch4 with "coming soon" text
- Fix Python version prerequisite (3.10+ → 3.9+)
Copy link
Copy Markdown

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 AI Agent: code-reviewer

Review Summary

This pull request introduces a three-chapter tutorial for teaching declarative policy governance for AI agents. The tutorial is well-structured and provides clear examples for YAML-based policy creation, role-based capability scoping, and rate limiting. While the content is educational and useful, there are several areas that require attention to ensure correctness, security, and maintainability.


🔴 CRITICAL: Security Issues

  1. Policy Evaluation Trust Boundary

    • The tutorial does not address how to validate the integrity of the YAML policy files. If an attacker modifies the policy files, they could bypass security controls or introduce malicious rules.
    • Actionable Fix: Implement cryptographic signing and verification for policy files. Use a secure mechanism (e.g., HMAC or digital signatures) to ensure that only authorized policies are loaded into the PolicyEvaluator.
  2. Rate Limiting Implementation

    • The max_tool_calls limit is described as metadata that must be enforced externally. This creates a risk of inconsistent enforcement across different parts of the application.
    • Actionable Fix: Integrate max_tool_calls enforcement directly into the PolicyEvaluator class. This ensures that rate limiting is applied uniformly and cannot be bypassed.
  3. TokenBucket Implementation

    • The TokenBucket example does not account for thread safety. If multiple threads or processes access the same bucket, race conditions could allow agents to exceed rate limits.
    • Actionable Fix: Use thread-safe mechanisms (e.g., threading.Lock or atomic operations) to protect the TokenBucket state during concurrent access.

🟡 WARNING: Potential Breaking Changes

  1. Backward Compatibility of Policy Schema

    • The tutorial introduces YAML policies with new fields like max_tool_calls. If existing users rely on older schemas, this could break their workflows.
    • Actionable Fix: Ensure backward compatibility by versioning the schema. Add validation logic to detect unsupported fields and provide clear error messages.
  2. Default Behavior Changes

    • The tutorial suggests changing the default action from allow to deny in exercises. This could lead to unexpected behavior in production if users adopt these changes without understanding the implications.
    • Actionable Fix: Clearly document the risks of changing default actions and recommend testing in isolated environments before deployment.

💡 Suggestions for Improvement

  1. Type Safety and Validation

    • The PolicyDocument.from_yaml method should validate the YAML structure against the schema using Pydantic models. This ensures type safety and prevents runtime errors due to malformed policies.
    • Actionable Fix: Add Pydantic-based validation to PolicyDocument.from_yaml.
  2. OWASP Agentic Top 10 Compliance

    • The tutorial does not address sandboxing or preventing agents from escaping their execution environment.
    • Actionable Fix: Add a section on sandboxing techniques (e.g., containerization, syscall filtering) to ensure agents cannot perform unauthorized actions outside their scope.
  3. Thread Safety in PolicyEvaluator

    • The PolicyEvaluator class does not appear to be thread-safe. If multiple agents evaluate policies concurrently, race conditions could occur.
    • Actionable Fix: Use thread-safe data structures (e.g., threading.Lock) for managing the policies list.
  4. Testing Coverage

    • The tutorial examples are useful but lack automated tests to verify correctness. This could lead to regressions in future updates.
    • Actionable Fix: Add pytest-based unit tests for each example script, including edge cases like malformed YAML files and conflicting policies.
  5. Environment-Specific Policies

    • The tutorial mentions conditional policies in a future chapter but does not provide any implementation details.
    • Actionable Fix: Add a placeholder section with a brief explanation of how environment-specific policies could be implemented (e.g., using environment variables or configuration files).
  6. Documentation Enhancements

    • The tutorial is comprehensive but could benefit from diagrams illustrating policy evaluation flow and rate limiting mechanisms.
    • Actionable Fix: Add visual aids (e.g., flowcharts, sequence diagrams) to the documentation.

Final Recommendations

  • Address the 🔴 CRITICAL security issues immediately, as they could lead to policy bypass or inconsistent enforcement.
  • Ensure backward compatibility and document potential breaking changes clearly.
  • Implement the 💡 suggestions to improve type safety, thread safety, and compliance with security best practices.

Once these issues are resolved, the tutorial will be a valuable resource for users looking to implement policy-as-code for AI agent governance.

imran-siddique
imran-siddique previously approved these changes Apr 5, 2026
Copy link
Copy Markdown
Member

@imran-siddique imran-siddique left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well-structured tutorial with runnable examples. Two suggestions:

  1. Change 'Closes #706' to 'Ref #706' in the PR description since this only delivers chapters 1-3 of 7
  2. Add a brief comment in Chapter 3 explaining the PolicyEvaluator constructor pattern change

Approved - please resolve the automated review threads so this can merge.

@github-actions github-actions bot added the documentation Improvements or additions to documentation label Apr 5, 2026
Copy link
Copy Markdown

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 AI Agent: code-reviewer

Review Summary

This pull request introduces a tutorial for policy-as-code, covering chapters 1–3. The tutorial is well-structured and provides clear examples for understanding policy governance concepts such as allow/deny rules, capability scoping, and rate limiting. The documentation is detailed and includes runnable Python examples, YAML policy files, and exercises for hands-on learning. However, there are some areas that require attention to ensure security, correctness, and maintainability.


🔴 CRITICAL: Security Concerns

  1. Policy Evaluation Trust Boundary

    • The PolicyEvaluator appears to evaluate agent actions based on the provided context and policy. However, the tutorial does not mention how the context dictionary is validated or sanitized before evaluation. If the context is user-controlled, it could lead to security vulnerabilities such as injection attacks or bypassing policy rules.
    • Actionable Recommendation: Ensure that the context dictionary is validated against a strict schema using Pydantic models or similar validation mechanisms to prevent malicious input.
  2. Rate Limiting Enforcement

    • The max_tool_calls limit is described as metadata that the application must enforce. This introduces a risk of inconsistent enforcement across different parts of the application, potentially allowing agents to bypass rate limits.
    • Actionable Recommendation: Integrate rate-limiting enforcement directly into the PolicyEvaluator or provide a wrapper that ensures consistent enforcement of rate limits across all agent actions.
  3. Role-Based Policy Loading

    • The tutorial mentions loading policies based on agent roles but does not address how roles are securely assigned or verified. If role assignment is compromised, an agent could gain unauthorized permissions.
    • Actionable Recommendation: Use SPIFFE/SVID or other cryptographic identity mechanisms to securely associate agents with roles and ensure that policy files are loaded based on verified identities.

🟡 WARNING: Potential Breaking Changes

  1. Backward Compatibility of Policy Schema
    • The tutorial introduces YAML-based policies with specific fields (version, name, description, rules, defaults). If future changes are made to the schema (e.g., adding new fields or modifying existing ones), it could break existing policies.
    • Actionable Recommendation: Implement schema versioning and validation using Pydantic models. Ensure backward compatibility by supporting older versions of the schema.

💡 Suggestions for Improvement

  1. Thread Safety in Policy Evaluation

    • The PolicyEvaluator appends policies to its policies attribute, which could lead to race conditions in concurrent agent execution scenarios.
    • Actionable Recommendation: Ensure thread safety by using locks or thread-safe data structures for managing policies in PolicyEvaluator.
  2. Sandbox Escape Vectors

    • The tutorial does not address potential sandbox escape vectors, such as agents executing arbitrary code or accessing unauthorized resources.
    • Actionable Recommendation: Include a section in the tutorial that discusses best practices for sandboxing agents and preventing escape vectors.
  3. Type Safety

    • The PolicyEvaluator and PolicyDocument classes should enforce strict type safety for inputs and outputs. For example, the evaluate method should ensure that the context dictionary conforms to a predefined schema.
    • Actionable Recommendation: Use Pydantic models for type validation and enforce strict typing in all public methods.
  4. Policy Testing

    • While the tutorial mentions policy testing in future chapters, it would be beneficial to include basic testing strategies in the current chapters. For example, testing edge cases like overlapping rules, empty policies, or invalid YAML files.
    • Actionable Recommendation: Add a section on testing policies in each chapter, including examples of how to write unit tests for policies using pytest.
  5. Error Handling

    • The tutorial does not address error handling for scenarios like missing or invalid YAML files, malformed policies, or exceptions during evaluation.
    • Actionable Recommendation: Add error handling examples to the tutorial, including how to catch and handle exceptions raised by PolicyEvaluator or PolicyDocument.
  6. Documentation Improvements

    • The tutorial is well-written but could benefit from additional details on how to integrate policies into real-world applications. For example, how to dynamically load policies based on environment variables or external configuration.
    • Actionable Recommendation: Add a section on "Integrating Policies into Applications" with code snippets and best practices.

Summary of Actions

🔴 CRITICAL

  • Validate and sanitize the context dictionary before policy evaluation.
  • Enforce max_tool_calls directly within the PolicyEvaluator or provide a consistent enforcement mechanism.
  • Implement secure role assignment and verification using cryptographic identity mechanisms like SPIFFE/SVID.

🟡 WARNING

  • Add schema versioning and validation to ensure backward compatibility for policy files.

💡 SUGGESTION

  • Ensure thread safety in PolicyEvaluator for concurrent agent execution.
  • Address sandbox escape vectors in the tutorial.
  • Enforce type safety using Pydantic models for inputs and outputs.
  • Include basic policy testing strategies in the tutorial.
  • Add error handling examples for common issues like invalid YAML files or malformed policies.
  • Expand documentation to cover real-world integration of policies into applications.

Next Steps

Please address the critical security concerns as a priority. Once resolved, consider implementing the suggestions to improve the robustness and usability of the tutorial and the underlying policy engine.

@harinarayansrivatsan
Copy link
Copy Markdown
Author

Well-structured tutorial with runnable examples. Two suggestions:

  1. Change 'Closes 🎓 Tutorial: Policy-as-Code for AI Agents — From Zero to Production #706' to 'Ref 🎓 Tutorial: Policy-as-Code for AI Agents — From Zero to Production #706' in the PR description since this only delivers chapters 1-3 of 7
  2. Add a brief comment in Chapter 3 explaining the PolicyEvaluator constructor pattern change

Approved - please resolve the automated review threads so this can merge.

Thanks Imran, I've added the constructor comment in Chapter 3 and updated the PR description.

@imran-siddique imran-siddique enabled auto-merge (squash) April 5, 2026 21:26
Copy link
Copy Markdown

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 AI Agent: code-reviewer

Review Summary

This pull request introduces a tutorial for the policy-as-code feature of the microsoft/agent-governance-toolkit. The tutorial spans three chapters, covering basic allow/deny rules, capability scoping, and rate limiting. It includes YAML policy files, Python examples, and detailed documentation. While the tutorial is well-written and provides clear examples, there are several areas that require attention to ensure security, correctness, and alignment with the repository's focus areas.


🔴 CRITICAL: Security Issues

  1. Policy Evaluation Trust Boundary

    • The PolicyEvaluator directly evaluates YAML-based policies without any validation of the input source. This could lead to policy injection attacks if the YAML file is tampered with or maliciously crafted.
    • Actionable Fix: Implement strict validation of the YAML input before loading it into PolicyDocument. Ensure that the YAML file is signed and verify its integrity using cryptographic methods (e.g., HMAC or digital signatures). This is especially important for production environments where policies may be loaded dynamically.
  2. Rate Limiting Enforcement

    • The tutorial mentions that max_tool_calls is "metadata" and does not enforce limits automatically. This creates a risk of rate-limiting bypass if developers forget to implement the enforcement logic.
    • Actionable Fix: Update the PolicyEvaluator to enforce max_tool_calls directly during evaluation. This ensures that rate limits are applied consistently and reduces the risk of developer error.
  3. Token Bucket Implementation

    • The TokenBucket implementation appears to rely on the client application to enforce rate limits. This could lead to race conditions in concurrent agent execution scenarios, where multiple threads or processes consume tokens simultaneously.
    • Actionable Fix: Ensure that the TokenBucket implementation is thread-safe. Use synchronization primitives like locks or semaphores to prevent race conditions during token consumption and refill.

🟡 WARNING: Potential Breaking Changes

  1. Backward Compatibility of Policy Schema
    • The tutorial introduces new fields (max_tool_calls) in the policy schema. If these fields are not backward-compatible with existing policies, it could break applications relying on older versions of the library.
    • Actionable Fix: Ensure that the PolicyDocument model can handle policies without the max_tool_calls field gracefully. Add unit tests to verify backward compatibility.

💡 Suggestions for Improvement

  1. Type Safety

    • The PolicyDocument.from_yaml() method does not appear to enforce strict type validation for the YAML input. This could lead to runtime errors if the YAML file contains unexpected data types.
    • Actionable Fix: Use Pydantic models to validate the structure and types of the YAML input before loading it into the PolicyDocument.
  2. OWASP Agentic Top 10 Compliance

    • The tutorial does not address potential sandbox escape vectors, such as an agent executing arbitrary code via tools like delete_database or write_file.
    • Actionable Fix: Add a section to the tutorial emphasizing the importance of sandboxing and validating agent actions to prevent unauthorized code execution.
  3. Documentation Improvements

    • While the tutorial is comprehensive, it could benefit from additional emphasis on real-world security considerations. For example:
      • How to securely store and manage policy files.
      • How to handle policy updates in production environments.
      • How to integrate policy evaluation with SPIFFE/SVID for agent identity verification.
    • Actionable Fix: Add a "Security Best Practices" section to the tutorial.
  4. Testing Coverage

    • The tutorial does not include automated tests for the example scripts or the YAML policies.
    • Actionable Fix: Add pytest-based tests to validate the correctness of the example scripts and ensure that the YAML policies produce the expected results when evaluated.
  5. Thread Safety

    • The tutorial does not address thread safety in concurrent agent execution scenarios. This is particularly important for the PolicyEvaluator and TokenBucket components.
    • Actionable Fix: Add a section to the tutorial discussing how to use these components in multithreaded or multiprocessing environments. Include examples of using locks or other synchronization mechanisms.
  6. Policy Conflict Resolution

    • The tutorial mentions that rules are checked in priority order, but it does not explain how conflicts between rules are resolved (e.g., two rules with the same priority but conflicting actions).
    • Actionable Fix: Clarify the conflict resolution strategy in the documentation and ensure the PolicyEvaluator handles such cases deterministically.

Additional Notes

  • The tutorial is well-structured and provides clear, step-by-step instructions. It is a good starting point for users new to policy-as-code.
  • The examples are runnable and provide immediate feedback, which is excellent for learning.
  • The inclusion of YAML snippets and their breakdown is helpful for understanding the policy schema.

Summary of Actions

Security

  • 🔴 Validate YAML input and enforce integrity checks (e.g., HMAC or digital signatures).
  • 🔴 Enforce max_tool_calls directly in PolicyEvaluator.
  • 🔴 Ensure TokenBucket implementation is thread-safe.

Backward Compatibility

  • 🟡 Ensure backward compatibility for policies without max_tool_calls.

Improvements

  • 💡 Use Pydantic for type validation of YAML input.
  • 💡 Add a "Security Best Practices" section to the tutorial.
  • 💡 Add automated tests for example scripts and YAML policies.
  • 💡 Discuss thread safety and provide examples for concurrent execution.
  • 💡 Clarify policy conflict resolution in the documentation.

Let me know if you need further clarification or assistance!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation size/XL Extra large PR (500+ lines)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants