Skip to content

Python: feat: add agent-framework-monty (Monty-backed CodeAct provider)#5915

Open
eavanvalkenburg wants to merge 7 commits into
microsoft:mainfrom
eavanvalkenburg:monty_codeact_provider
Open

Python: feat: add agent-framework-monty (Monty-backed CodeAct provider)#5915
eavanvalkenburg wants to merge 7 commits into
microsoft:mainfrom
eavanvalkenburg:monty_codeact_provider

Conversation

@eavanvalkenburg
Copy link
Copy Markdown
Member

Motivation and Context

Inspired by anthonychu/maf-codeact-monty-python.

CodeAct currently has one backend in the Python repo: agent-framework-hyperlight. Hyperlight depends on a WASM micro-VM that is only published for linux/x86_64 and win/AMD64 with Python <3.14. macOS / arm64 / 3.14 users get no CodeAct story.

This PR adds a second backend — agent-framework-monty — that wraps pydantic-monty, a Rust-based Python interpreter, behind the same *CodeActProvider / *ExecuteCodeTool shape as Hyperlight, so users can swap providers with minimal churn. Monty runs cross-platform (no hypervisor or WASM backend), validates LLM-generated code against tool signatures with ty before any host tool fires, and supports Monty-native ResourceLimits for CPU / memory / output caps.

Description

New alpha package agent-framework-monty (python/packages/monty/).

Public API (mirrors Hyperlight names where they apply):

  • MontyCodeActProviderContextProvider that injects a run-scoped execute_code tool plus dynamic CodeAct instructions.
  • MontyExecuteCodeTool — standalone FunctionTool for mixed-tool agents or manual static wiring.
  • FileMount / FileMountInput / MountMode — public types; same first two FileMount fields as the Hyperlight version, with Monty-native mode ("read-only"/"read-write"/"overlay") and write_bytes_limit.

Constructor kwargs: tools, approval_mode, workspace_root (auto-mounted at /input, matching Hyperlight), file_mounts, plus a Monty-only resource_limits forwarding to Monty.start(limits=...).

Filesystem flow mirrors Hyperlight's /output capture: files written under any read-write mount during execution are scanned post-run and returned as Content.from_data(...) items with a path annotation. overlay mounts buffer writes in memory (nothing escapes the sandbox), read-only mounts reject writes.

Internals:

  • _monty_bridge.InlineCodeBridge ports the inline (non-durable) pause/resume bridge from the reference repo; dispatches direct typed tool calls + the call_tool fallback; forwards mount / limits to Monty.start(...).
  • generate_type_stubs builds per-tool stubs so ty rejects bad calls before any host tool fires.
  • Approval-mode propagation: if any registered host tool is always_require, the whole execute_code is gated.

Alpha-policy compliance (per python-package-management skill):

  • Added agent-framework-monty = { workspace = true } to root python/pyproject.toml.
  • Added row to python/PACKAGE_STATUS.md.
  • Added monty entry under Experimental in python/AGENTS.md.
  • Not added to core[all]; no agent_framework.monty lazy-loading shim — both deferred until beta promotion. Samples import from agent_framework_monty import ... directly.

Samples (3 sets):

  1. samples/02-agents/context_providers/code_act/monty_code_act.py (provider pattern) + updated local README pointing at both providers.
  2. samples/02-agents/tools/monty_code_interpreter/ — standalone + manual-wiring + README.
  3. samples/04-hosting/foundry-hosted-agents/responses/11_monty_codeact/ — full hosted-agent layout with a uv-based pyproject.toml + Dockerfile, Azure Monitor wiring (APPLICATIONINSIGHTS_CONNECTION_STRING + enable_instrumentation() in main.py), ENABLE_INSTRUMENTATION and ENABLE_SENSITIVE_DATA env vars. The alpha wheel is vendored into ./wheels/ (gitignored) via vendor-wheel.sh. New row added to the parent Responses-API README.

Tests:

  • 28 hermetic unit tests stubbing pydantic_monty for speed and to keep CI working without the dep.
  • 18 integration tests marked @pytest.mark.integration, auto-skipped when pydantic_monty is unimportable. They exercise the real Monty runtime: print round-trip, last-expression value, direct typed dispatch, call_tool fallback, async host tool, asyncio.gather parallelism, ty type-check rejection, OS-blocked-by-default, workspace_root read + write capture, read-only / overlay mount semantics, resource_limits.max_duration_secs aborting a busy loop, approval gating end-to-end, full Agent run with a scripted chat client.

Out of scope (deliberately, for the alpha)

  • Durable execution — the reference repo's DurableCodeBridge, register_durable_codeact, wait_for_external_event, and per-tool external-event approval. Tracked as a follow-up.
  • Custom OSAccess (fully synthetic VFS) — flagged as a future escape hatch in AGENTS.md.
  • URL allow-list — Monty has no networking primitive; documented pattern is "expose a fetch_url host tool with your own allow-list check".

Contribution Checklist

  • The code builds clean without any errors or warnings
  • The PR follows the Contribution Guidelines
  • All unit tests pass, and I have added new tests where possible
  • Is this a breaking change? No — additive only (new package + new sample folders).

New alpha package that wraps pydantic-monty (a Rust-based Python
interpreter) behind the same CodeAct API surface as
agent-framework-hyperlight, so users can swap providers with minimal
code change.

Public API (agent_framework_monty):
- MontyCodeActProvider — ContextProvider that injects a run-scoped
  execute_code tool plus dynamic CodeAct instructions.
- MontyExecuteCodeTool — standalone FunctionTool for mixed-tool agents
  or manual static wiring.
- FileMount / FileMountInput / MountMode — public types mirroring the
  Hyperlight names, with Monty's mode (read-only/read-write/overlay)
  and write_bytes_limit on FileMount.

Constructor kwargs (both classes) mirror Hyperlight where possible:
tools, approval_mode, workspace_root, file_mounts; plus a Monty-only
resource_limits forwarding ResourceLimits to Monty.start().

Filesystem flow:
- workspace_root auto-mounts at /input (read-write), matching Hyperlight.
- file_mounts accepts string shorthand, (host, mount) tuple, or
  FileMount with mode + write cap.
- Files written under read-write mounts are scanned post-execution and
  returned as Content.from_data items (mirrors Hyperlight /output).
- overlay mounts buffer writes in-memory; read-only mounts reject writes.

Internals:
- _monty_bridge.InlineCodeBridge ports the inline (non-durable) bridge
  from anthonychu/maf-codeact-monty-python; handles FunctionSnapshot /
  FutureSnapshot pause/resume, dispatches direct typed calls + the
  call_tool fallback, forwards mount/limits to Monty.start(...).
- generate_type_stubs emits per-tool stubs so Monty's `ty` type-checker
  rejects bad calls before any host tool runs.

Alpha-policy compliance (per python-package-management skill):
- Added agent-framework-monty = { workspace = true } to root
  pyproject.toml.
- Added row to python/PACKAGE_STATUS.md.
- Added monty entry under Experimental in python/AGENTS.md.
- NOT added to core[all]; NO agent_framework.monty lazy shim (deferred
  to beta promotion).

Samples (three sets, import from agent_framework_monty directly):
- samples/02-agents/context_providers/code_act/monty_code_act.py
  (provider pattern) + updated local README.
- samples/02-agents/tools/monty_code_interpreter/ (standalone +
  manual-wiring + README).
- samples/04-hosting/foundry-hosted-agents/responses/11_monty_codeact/
  (full hosted-agent layout with uv-based pyproject.toml + Dockerfile,
  Azure Monitor wiring via APPLICATIONINSIGHTS_CONNECTION_STRING +
  enable_instrumentation, ENABLE_INSTRUMENTATION and
  ENABLE_SENSITIVE_DATA env vars). The alpha wheel is vendored into
  ./wheels/ (gitignored) via vendor-wheel.sh; new row added to the
  parent Responses-API README.

Tests:
- 28 hermetic unit tests (stubbed pydantic_monty).
- 18 integration tests marked @pytest.mark.integration, auto-skipped
  when pydantic_monty is unimportable; exercise the real Monty
  runtime: print round-trip, last-expression value, direct typed
  tool dispatch, call_tool fallback, async tool, asyncio.gather
  parallelism, ty type-check rejection, OS blocked by default,
  workspace_root read+write capture, read-only / overlay mount
  semantics, resource_limits.max_duration_secs abort, approval
  gating end-to-end, full Agent run with a scripted chat client.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings May 18, 2026 10:04
@moonbox3 moonbox3 added documentation Improvements or additions to documentation python labels May 18, 2026
@moonbox3
Copy link
Copy Markdown
Contributor

moonbox3 commented May 18, 2026

Python Test Coverage

Python Test Coverage Report •
FileStmtsMissCoverMissing
packages/monty/agent_framework_monty
   _execute_code_tool.py2627670%63, 88, 92, 94, 101, 108–112, 146, 148–149, 230, 273, 289, 293, 298, 349–350, 434, 437, 442–443, 461–477, 492–503, 522–537, 543–544, 547–551
   _instructions.py39197%40
   _monty_bridge.py1954974%29–39, 48, 51, 53, 66, 82–90, 93, 127–131, 162, 165–166, 169–172, 190–191, 222, 240, 242, 258–260, 303, 308, 326–327
   _provider.py34488%69, 73, 77, 81
   _types.py110100% 
TOTAL34986407088% 

Python Unit Test Overview

Tests Skipped Failures Errors Time
6948 30 💤 0 ❌ 0 🔥 1m 53s ⏱️

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new Python alpha package, agent-framework-monty, providing a Monty-backed CodeAct implementation (provider + standalone execute_code tool) alongside samples and test coverage, enabling a cross-platform CodeAct option beyond Hyperlight.

Changes:

  • Introduces agent_framework_monty package (provider/tool/types, instruction generation, Monty bridge, file-mount + output capture support).
  • Adds unit + integration tests for the Monty CodeAct surface, plus multiple samples (context provider, standalone tool, Foundry-hosted Responses agent).
  • Registers the new workspace package in Python packaging metadata and lockfiles, and updates package status/docs.

Reviewed changes

Copilot reviewed 28 out of 30 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
python/uv.lock Adds workspace member + locks pydantic-monty and the new monty package.
python/samples/04-hosting/foundry-hosted-agents/responses/11_monty_codeact/vendor-wheel.sh Script to build/vendor the alpha wheel for offline uv sync in Docker.
python/samples/04-hosting/foundry-hosted-agents/responses/11_monty_codeact/README.md Explains the hosted Responses sample and how Monty CodeAct works.
python/samples/04-hosting/foundry-hosted-agents/responses/11_monty_codeact/pyproject.toml Sample-local uv project config (including vendored wheel source).
python/samples/04-hosting/foundry-hosted-agents/responses/11_monty_codeact/main.py Hosted agent entrypoint wiring Foundry client + Monty provider + telemetry.
python/samples/04-hosting/foundry-hosted-agents/responses/11_monty_codeact/Dockerfile Docker build using uv sync and a vendored wheel.
python/samples/04-hosting/foundry-hosted-agents/responses/11_monty_codeact/agent.yaml Hosted-agent config for local/Foundry runs (env vars + resources).
python/samples/04-hosting/foundry-hosted-agents/responses/11_monty_codeact/agent.manifest.yaml Foundry manifest describing the hosted Monty CodeAct sample.
python/samples/04-hosting/foundry-hosted-agents/README.md Adds a row documenting the new Monty CodeAct hosted sample.
python/samples/02-agents/tools/monty_code_interpreter/README.md Documents local standalone/manual-wiring Monty tool samples.
python/samples/02-agents/tools/monty_code_interpreter/monty_code_interpreter.py Standalone MontyExecuteCodeTool sample.
python/samples/02-agents/tools/monty_code_interpreter/monty_code_interpreter_manual_wiring.py Manual static wiring sample (instructions + sandbox tool).
python/samples/02-agents/context_providers/code_act/README.md Expands CodeAct docs to cover both Hyperlight and Monty providers.
python/samples/02-agents/context_providers/code_act/monty_code_act.py Provider-driven Monty CodeAct sample with middleware logging.
python/pyproject.toml Adds agent-framework-monty to the Python workspace dependencies.
python/packages/monty/tests/monty/test_monty_codeact.py Hermetic unit tests with a fake pydantic_monty runtime.
python/packages/monty/tests/monty/test_monty_codeact_integration.py Integration tests exercising the real Monty runtime (skipped if unavailable).
python/packages/monty/README.md Package readme describing the Monty CodeAct API and usage patterns.
python/packages/monty/pyproject.toml Defines the new alpha distribution, deps, tooling config, and tasks.
python/packages/monty/LICENSE MIT license for the new package.
python/packages/monty/AGENTS.md Package-level agent/dev guide and architecture notes.
python/packages/monty/agent_framework_monty/py.typed Marks the package as typed for type checkers.
python/packages/monty/agent_framework_monty/_types.py Public file-mount types (mode, mount input shapes).
python/packages/monty/agent_framework_monty/_provider.py MontyCodeActProvider implementation (run-scoped tool + instructions).
python/packages/monty/agent_framework_monty/_monty_bridge.py Inline Monty execution bridge + stub generation for ty.
python/packages/monty/agent_framework_monty/_instructions.py Dynamic instructions + execute_code description builders.
python/packages/monty/agent_framework_monty/_execute_code_tool.py MontyExecuteCodeTool implementation (mounts, approval gating, output capture).
python/packages/monty/agent_framework_monty/init.py Public exports and version wiring.
python/PACKAGE_STATUS.md Registers agent-framework-monty as alpha.
python/AGENTS.md Adds monty under Experimental packages list.

Comment thread python/packages/monty/agent_framework_monty/_execute_code_tool.py Outdated
Comment thread python/packages/monty/agent_framework_monty/_execute_code_tool.py
Comment thread python/packages/monty/agent_framework_monty/_monty_bridge.py
Comment thread python/packages/monty/agent_framework_monty/_monty_bridge.py
Comment thread python/packages/monty/agent_framework_monty/_instructions.py Outdated
…IX path

The shorthand string mount goes through _normalize_mount_path, which
rewrites Windows drive letters like 'C:\\Users\\...' into
'/C:/Users/...' (POSIX-style). The Windows CI runners surfaced this
because tmp_path resolves to a backslashed Windows path; the test was
comparing against the raw str(host_a) instead of the normalized form.

Compare against _normalize_mount_path(str(host_a)) so the assertion is
platform-independent.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copy link
Copy Markdown

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Automated Code Review

Reviewers: 3 | Confidence: 90%

✓ Correctness

No actionable issues found in this dimension.

✓ Security Reliability

No actionable issues found in this dimension.

✗ Design Approach

I found one design issue: the Monty-specific instructions injected into provider/tool runs document a print-only result contract that contradicts the runtime behavior asserted by the new integration tests. That means the recommended provider path teaches the model to avoid a supported output path and can steer generations away from the API this PR actually introduces. The new Monty hosted-agent sample has a documentation/design mismatch that makes the advertised local-run path fail: its README defers to the shared hosted-agent setup flow, but this sample is packaged around pyproject.toml plus a vendored wheel and does not fit the parent requirements.txt install step.

Flagged Issues

  • The sample README tells readers to follow the parent local-run instructions (python/samples/04-hosting/foundry-hosted-agents/responses/11_monty_codeact/README.md:60-62), but that flow installs dependencies with uv pip install -r requirements.txt (python/samples/04-hosting/foundry-hosted-agents/README.md:160-163). This sample instead declares dependencies in pyproject.toml and resolves agent-framework-monty from a vendored wheel (python/samples/04-hosting/foundry-hosted-agents/responses/11_monty_codeact/pyproject.toml:1-20), so following the documented path fails before the host can start.

Automated review by eavanvalkenburg's agents

eavanvalkenburg and others added 4 commits May 18, 2026 12:18
- _execute_code_tool docstring: clarify that the Monty backend supports
  scoped filesystem access via workspace_root / file_mounts (blocked by
  default).
- _to_monty_mount: import pydantic_monty lazily through load_monty so
  missing-dependency errors surface as the same actionable RuntimeError
  the rest of the package raises (not a bare ImportError at module load).
  Renamed _load_monty -> load_monty for the same reason.
- _python_type_repr: emit None for type(None) instead of Any, and
  normalize both typing.Union[...] and PEP-604 X | Y to PEP-604 syntax
  so Optional[X] / Union[..., None] / -> None signatures round-trip
  correctly through ty validation. Added a regression test.
- _PrintCollector: track a running character count instead of
  recomputing sum(len(c) for c in self.chunks) per callback. Eliminates
  the O(n^2) cost on print-heavy code.
- Instructions: mention that the value of the final expression is also
  returned alongside captured stdout (matches actual behavior).
- 11_monty_codeact Dockerfile: pin ghcr.io/astral-sh/uv to 0.11.6
  instead of :latest for reproducible builds.
- 11_monty_codeact README: replace the bare "see parent README" pointer
  with sample-specific steps (./vendor-wheel.sh + uv sync + uv run),
  since the sample uses pyproject.toml + a vendored wheel rather than
  requirements.txt.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…PyPI

Drop the vendored-wheel scaffolding now that agent-framework-monty is on
PyPI as an alpha (1.0.0a*) release:

- pyproject.toml: remove [tool.uv.sources] override; keep [tool.uv]
  prerelease = "allow" so uv pulls the alpha automatically.
- Dockerfile: drop the COPY wheels/ step.
- README: drop the ./vendor-wheel.sh setup step and the
  not-yet-on-PyPI warning.
- Delete vendor-wheel.sh and the gitignored wheels/ directory.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…k escape

Same class of issue as the MSRC-reported Hyperlight finding: the
post-execution capture walked workspace_root with Path.rglob() +
is_file() + read_bytes() - all of which follow symlinks. An attacker
who controls the workspace (cloned repo, extracted archive, shared
workspace) could pre-place `workspace/leak.txt -> /etc/passwd` or
`workspace/outside_dir -> /etc/` and have host files surface as
captured Content items.

Monty's mount layer already rejects symlink reads from inside the
sandbox across all three modes (verified empirically), so the runtime
path was safe. This commit closes the post-execution scan path.

Changes:
- New `_iter_real_files(root)` walker that uses iterdir() +
  is_symlink() to skip symlinks at every directory level and yields
  only real files. Replaces the previous `host_root.rglob("*")` calls
  in both `_snapshot_writable_mounts` and `_capture_written_files`.
- Use `Path.lstat()` instead of `Path.stat()` so size/mtime can never
  be taken from a symlink target.
- Three new integration tests reproducing the MSRC attack shape
  against the workspace_root flow: symlink-to-file outside workspace,
  symlink-to-directory outside workspace, and a guard ensuring
  legitimate sandbox writes are still captured when symlinks are
  present.

Per user request, hyperlight is untouched in this commit (separate fix).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Apply the same Windows-CI safety guard as the hyperlight fix in PR microsoft#5919:
the three symlink integration tests create symlinks via Path.symlink_to(),
which fails with OSError / NotImplementedError on unprivileged Windows
runners. Add a local _symlinks_supported helper (mirroring the one in
packages/core/tests/core/test_skills.py) and pytest.skip when symlinks
aren't available, so the tests no longer fail for environment reasons.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Comment thread python/packages/monty/agent_framework_monty/_monty_bridge.py
Comment thread python/packages/monty/agent_framework_monty/_monty_bridge.py
- _invoke_tool: drop the inspect.iscoroutinefunction(...) branch and
  always `await self.tool_map[name](**kwargs)`. Every entry in
  tool_map is `partial(FunctionTool.invoke, skip_parsing=True)` and
  FunctionTool.invoke is `async def`, so the branching was dead code -
  and on Python versions affected by cpython#98590,
  iscoroutinefunction(partial(bound_async_method, ...)) returns False,
  causing the bridge to take the asyncio.to_thread path, return an
  unawaited coroutine, and surface it as a JSON-serialization failure
  for every tool call. Added a regression test
  test_invoke_tool_awaits_partial_wrapped_async_method.

- generate_type_stubs: skip tools whose name is not a valid Python
  identifier or is a Python keyword. FunctionTool.name has no upstream
  validation, so a name like "weird-name" produced a syntax error in
  the stubs and a name like "broken\n    pass\nasync def injected"
  would inject arbitrary stub source. Non-identifier names stay
  reachable via `call_tool("weird-name", ...)` at runtime; they just
  don't get type-checked stubs. Added regression test
  test_generate_type_stubs_skips_non_identifier_tool_names.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation python

Projects

Status: In Progress

Development

Successfully merging this pull request may close these issues.

3 participants