Skip to content

fix: correct AB test execution role IAM policy and promote stability#1120

Merged
jariy17 merged 16 commits intomainfrom
fix/ab-test-role-evaluator-permission
May 5, 2026
Merged

fix: correct AB test execution role IAM policy and promote stability#1120
jariy17 merged 16 commits intomainfrom
fix/ab-test-role-evaluator-permission

Conversation

@jariy17
Copy link
Copy Markdown
Contributor

@jariy17 jariy17 commented May 4, 2026

Summary

  • IAM policy — Aligns the auto-created AB test execution role with the official docs, fixing a 400 "Access denied when validating evaluator" error on every deploy with a custom evaluator
  • Promote stabilitypromote ab-test waits for executionStatus === 'RUNNING' before stopping, avoiding 409 errors when promote runs immediately after resume; throws a clear error if the test never reaches RUNNING
  • IAM — DescribeLogGroups — Split into its own statement with Resource: "*" (AWS requirement)
  • E2E coverage — Added AB test reaches RUNNING status after deploy to ab-test-target-based.test.ts to catch IAM permission regressions post-deploy
  • Status test — Fixed incorrect http-gateway resourceType assertion; replaced with ab-test + invocationUrl check
  • Local e2e script — Added scripts/run-e2e-local.sh to replicate the GitHub Actions e2e workflow locally

IAM changes

Trust policy — added SourceAccount + SourceArn conditions to prevent confused deputy attacks:

"Condition": {
  "StringEquals": { "aws:SourceAccount": "<accountId>" },
  "ArnLike": { "aws:SourceArn": "arn:aws:bedrock-agentcore:*:<accountId>:ab-test/*" }
}

Permissions — 3-statement policy aligned with docs:

  • AgentCoreResources — all bedrock-agentcore actions scoped to account with ResourceAccount condition (adds GetEvaluator, GetGatewayTarget, ListGatewayTargets, ListConfigurationBundleVersions)
  • CloudWatchLogsDescribelogs:DescribeLogGroups with Resource: "*" (required by AWS)
  • CloudWatchLogs — log read/write actions scoped to evaluation and spans log groups

Running E2E tests locally

export E2E_ROLE_ARN=arn:aws:iam::<account>:role/<role>
export E2E_SECRET_ARN=arn:aws:secretsmanager:<region>:<account>:secret:<name>
./scripts/run-e2e-local.sh e2e-tests/ab-test-target-based.test.ts

Test plan

  • e2e-tests/ab-test-target-based.test.ts — all steps pass including RUNNING check
  • e2e-tests/ab-test-config-bundle.test.ts — passes (RUNNING check deferred to follow-up once real config bundles are used)
  • Unit tests src/cli/operations/deploy/__tests__/post-deploy-ab-tests.test.ts — 23 passing
  • Unit tests src/cli/commands/pause/__tests__/promote.test.ts — 4 passing

The AB test API validates evaluator ARNs server-side using the execution
role. The auto-created role policy was missing bedrock-agentcore:GetEvaluator,
causing a 400 "Access denied when validating evaluator" error on every deploy
that included a target-based AB test with a custom evaluator.
@jariy17 jariy17 requested a review from a team May 4, 2026 23:26
@github-actions github-actions Bot added size/xs PR size: XS agentcore-harness-reviewing AgentCore Harness review in progress labels May 4, 2026
HTTP gateways are not surfaced as top-level resources in agentcore status —
they are only used internally to build AB test invocation URLs. The test was
asserting on resourceType 'http-gateway' which never appears; fix it to
assert on the 'ab-test' resource instead.
@github-actions github-actions Bot added size/s PR size: S and removed size/xs PR size: XS labels May 4, 2026
@agentcore-cli-automation
Copy link
Copy Markdown

The new EvaluatorReadStatement is scoped to Object.values(deployedResources?.evaluators ?? {}), which only contains custom evaluators deployed by this project. This will leave the original 400 error in place whenever the AB test's online-eval config references evaluators that aren't in deployedResources.evaluators:

  • Builtin evaluators (Builtin.Faithfulness, Builtin.GoalSuccessRate, etc.) — these are first-class evaluator references in OnlineEvalConfig.evaluators (see src/schema/schemas/primitives/online-eval-config.ts:23-24 and the handling in AgentCoreOnlineEvaluationConfig.ts:143-161). If the service validates evaluator ARNs uniformly using the execution role, Builtins will fail the same check the PR is trying to fix.
  • External evaluator ARNsOnlineEvalConfig.evaluators also accepts raw ARNs pointing at evaluators outside this project / another account. Those are never in deployedResources.evaluators either.

Separately, even for the custom-evaluator case, the policy grants GetEvaluator on every evaluator in the project rather than just the ones this specific AB test transitively references via its onlineEvalConfigArns. That's a least-privilege concern but not a functional blocker.

A couple of ways to address this:

  1. Resolve evaluators transitively from the referenced online eval configs. For each online eval config referenced by testSpec.evaluationConfig, look up its evaluators list in projectSpec.onlineEvalConfigs, and for each entry:

    • Builtin.* → construct the builtin ARN (arn:${partition}:bedrock-agentcore:::evaluator/Builtin.*)
    • arn:... → use as-is
    • otherwise → look up in deployedResources.evaluators

    Pass that de-duplicated list as evaluatorArns. This fixes both the Builtin/external case and the over-broad scoping.

  2. If tightening the scope is risky, at minimum add arn:${partition}:bedrock-agentcore:::evaluator/Builtin.* to the Resource list (and optionally arn:${partition}:bedrock-agentcore:${region}:${accountId}:evaluator/*) so users with builtin evaluators don't hit the same 400.

Could you also confirm whether you tested the fix with a Builtin evaluator in the online eval config, or only with a custom one? The test plan only mentions ab-test-target-based.test.ts / ab-test-config-bundle.test.ts — knowing which evaluator types those exercise would help.

@github-actions github-actions Bot removed the agentcore-harness-reviewing AgentCore Harness review in progress label May 4, 2026
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 4, 2026

Coverage Report

Status Category Percentage Covered / Total
🔵 Lines 43.11% 9016 / 20912
🔵 Statements 42.4% 9575 / 22582
🔵 Functions 39.96% 1555 / 3891
🔵 Branches 39.98% 5808 / 14527
Generated in workflow #2425 for commit 5b72ab3 by the Vitest Coverage Report Action

The resource cast was missing invocationUrl, causing a TypeScript error
when asserting the AB test's gateway invocation URL was present.
@github-actions github-actions Bot added size/s PR size: S and removed size/s PR size: S labels May 4, 2026
- Include stdout in promote failure message so the JSON error is visible
- Add scripts/run-e2e-local.sh to replicate the GitHub Actions e2e workflow locally
@github-actions github-actions Bot added size/m PR size: M and removed size/s PR size: S labels May 5, 2026
When promote is called immediately after resume, the AB test may still be
in UPDATING state and reject the STOPPED transition with a 409. Retry up
to 6 times with a 10s delay to wait for the transition to complete.
@github-actions github-actions Bot added size/m PR size: M and removed size/m PR size: M labels May 5, 2026
Poll getABTest until executionStatus is no longer UPDATING before
attempting to stop. The service rejects updates with 409 while a
state transition is in progress (e.g. after resume).
@github-actions github-actions Bot added size/m PR size: M and removed size/m PR size: M labels May 5, 2026
Poll getABTest until executionStatus === 'RUNNING' before issuing the
STOPPED transition. The service 409s if the test is still UPDATING
(e.g. transitioning from a prior resume).
@github-actions github-actions Bot added size/m PR size: M and removed size/m PR size: M labels May 5, 2026
…figs

Previously GetEvaluator was scoped to all project evaluators. Now it's
scoped to only the evaluators referenced by the specific online eval
configs this AB test uses, handling all three cases:
- Custom evaluators: looked up from deployedResources.evaluators
- Builtin.* evaluators: ARN constructed from the builtin ID
- External ARN references: used as-is
@github-actions github-actions Bot removed the size/m PR size: M label May 5, 2026
@jariy17
Copy link
Copy Markdown
Contributor Author

jariy17 commented May 5, 2026

Addressed in 44bc3f2. The evaluator ARN resolution is now transitive from the online eval configs the AB test references — handling all three cases:

  • Custom evaluators — looked up from deployedResources.evaluators
  • Builtin.* evaluators — ARN constructed as arn:${partition}:bedrock-agentcore:::evaluator/Builtin.*
  • External ARN references — used as-is

The policy is now scoped to only the evaluators this specific AB test transitively references (not all project evaluators), addressing the least-privilege concern.

The e2e tests exercise a custom evaluator (ABTestEvaluator in ab-test-target-based.test.ts). We haven't explicitly tested with Builtin evaluators, but the ARN construction follows the same pattern used in the L3 CDK constructs (AgentCoreOnlineEvaluationConfig.ts:143-161).

Replace the hand-rolled per-resource policy with the canonical policy
from https://docs.aws.amazon.com/bedrock-agentcore/latest/devguide/ab-testing-prereqs.html:

- Trust policy: add SourceAccount + SourceArn conditions
- Permissions: single AgentCoreResources statement scoped to
  arn:aws:bedrock-agentcore:*:${accountId}:* with ResourceAccount condition
- Add missing actions: GetGatewayTarget, ListGatewayTargets,
  ListConfigurationBundleVersions
- CloudWatch logs scoped to account via PrincipalAccount pattern
- Remove per-evaluator/per-resource ARN tracking (no longer needed)
@github-actions github-actions Bot added the size/m PR size: M label May 5, 2026
@github-actions github-actions Bot added size/m PR size: M and removed size/m PR size: M labels May 5, 2026
…-utils

Extract waitForRunningThenStop into promote-utils.ts so it can be unit
tested without pulling in React/ink. Add 4 tests covering: immediate
RUNNING, polling until RUNNING, timeout throws, and error message content.
@github-actions github-actions Bot added size/m PR size: M and removed size/m PR size: M labels May 5, 2026
Comment thread src/cli/operations/deploy/post-deploy-ab-tests.ts
padmak30
padmak30 previously approved these changes May 5, 2026
@github-actions github-actions Bot added size/m PR size: M and removed size/m PR size: M labels May 5, 2026
@github-actions github-actions Bot added size/m PR size: M and removed size/m PR size: M labels May 5, 2026
@jariy17 jariy17 changed the title fix: correct AB test execution role IAM policy and promote stability fix: align AB test execution role IAM policy and fix promote stability May 5, 2026
…N/version format in config-bundle e2e test
@github-actions github-actions Bot added size/m PR size: M and removed size/m PR size: M labels May 5, 2026
@jariy17 jariy17 changed the title fix: align AB test execution role IAM policy and fix promote stability fix: correct AB test execution role IAM policy and promote stability May 5, 2026
@github-actions github-actions Bot added size/m PR size: M and removed size/m PR size: M labels May 5, 2026
@notgitika notgitika self-requested a review May 5, 2026 20:59
@notgitika
Copy link
Copy Markdown
Contributor

waitForRunningThenStop fixes the CLI promote race condition, but if there is a TUI path that calls updateABTest({ executionStatus: "STOPPED" }) directly, it has the same bug right?

Comment thread scripts/run-e2e-local.sh
@jariy17 jariy17 merged commit 9f231d0 into main May 5, 2026
25 checks passed
@jariy17 jariy17 deleted the fix/ab-test-role-evaluator-permission branch May 5, 2026 22:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size/m PR size: M

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants