fix: correct AB test execution role IAM policy and promote stability by jariy17 · Pull Request #1120 · aws/agentcore-cli

jariy17 · 2026-05-04T23:26:25Z

Summary

IAM policy — Aligns the auto-created AB test execution role with the official docs, fixing a 400 "Access denied when validating evaluator" error on every deploy with a custom evaluator
Promote stability — promote ab-test waits for executionStatus === 'RUNNING' before stopping, avoiding 409 errors when promote runs immediately after resume; throws a clear error if the test never reaches RUNNING
IAM — DescribeLogGroups — Split into its own statement with Resource: "*" (AWS requirement)
E2E coverage — Added AB test reaches RUNNING status after deploy to ab-test-target-based.test.ts to catch IAM permission regressions post-deploy
Status test — Fixed incorrect http-gateway resourceType assertion; replaced with ab-test + invocationUrl check
Local e2e script — Added scripts/run-e2e-local.sh to replicate the GitHub Actions e2e workflow locally

IAM changes

Trust policy — added SourceAccount + SourceArn conditions to prevent confused deputy attacks:

"Condition": {
  "StringEquals": { "aws:SourceAccount": "<accountId>" },
  "ArnLike": { "aws:SourceArn": "arn:aws:bedrock-agentcore:*:<accountId>:ab-test/*" }
}

Permissions — 3-statement policy aligned with docs:

AgentCoreResources — all bedrock-agentcore actions scoped to account with ResourceAccount condition (adds GetEvaluator, GetGatewayTarget, ListGatewayTargets, ListConfigurationBundleVersions)
CloudWatchLogsDescribe — logs:DescribeLogGroups with Resource: "*" (required by AWS)
CloudWatchLogs — log read/write actions scoped to evaluation and spans log groups

Running E2E tests locally

export E2E_ROLE_ARN=arn:aws:iam::<account>:role/<role>
export E2E_SECRET_ARN=arn:aws:secretsmanager:<region>:<account>:secret:<name>
./scripts/run-e2e-local.sh e2e-tests/ab-test-target-based.test.ts

Test plan

e2e-tests/ab-test-target-based.test.ts — all steps pass including RUNNING check
e2e-tests/ab-test-config-bundle.test.ts — passes (RUNNING check deferred to follow-up once real config bundles are used)
Unit tests src/cli/operations/deploy/__tests__/post-deploy-ab-tests.test.ts — 23 passing
Unit tests src/cli/commands/pause/__tests__/promote.test.ts — 4 passing

The AB test API validates evaluator ARNs server-side using the execution role. The auto-created role policy was missing bedrock-agentcore:GetEvaluator, causing a 400 "Access denied when validating evaluator" error on every deploy that included a target-based AB test with a custom evaluator.

HTTP gateways are not surfaced as top-level resources in agentcore status — they are only used internally to build AB test invocation URLs. The test was asserting on resourceType 'http-gateway' which never appears; fix it to assert on the 'ab-test' resource instead.

agentcore-cli-automation · 2026-05-04T23:30:35Z

The new EvaluatorReadStatement is scoped to Object.values(deployedResources?.evaluators ?? {}), which only contains custom evaluators deployed by this project. This will leave the original 400 error in place whenever the AB test's online-eval config references evaluators that aren't in deployedResources.evaluators:

Builtin evaluators (Builtin.Faithfulness, Builtin.GoalSuccessRate, etc.) — these are first-class evaluator references in OnlineEvalConfig.evaluators (see src/schema/schemas/primitives/online-eval-config.ts:23-24 and the handling in AgentCoreOnlineEvaluationConfig.ts:143-161). If the service validates evaluator ARNs uniformly using the execution role, Builtins will fail the same check the PR is trying to fix.
External evaluator ARNs — OnlineEvalConfig.evaluators also accepts raw ARNs pointing at evaluators outside this project / another account. Those are never in deployedResources.evaluators either.

Separately, even for the custom-evaluator case, the policy grants GetEvaluator on every evaluator in the project rather than just the ones this specific AB test transitively references via its onlineEvalConfigArns. That's a least-privilege concern but not a functional blocker.

A couple of ways to address this:

Resolve evaluators transitively from the referenced online eval configs. For each online eval config referenced by testSpec.evaluationConfig, look up its evaluators list in projectSpec.onlineEvalConfigs, and for each entry:
- Builtin.* → construct the builtin ARN (arn:${partition}:bedrock-agentcore:::evaluator/Builtin.*)
- arn:... → use as-is
- otherwise → look up in deployedResources.evaluators
Pass that de-duplicated list as evaluatorArns. This fixes both the Builtin/external case and the over-broad scoping.
If tightening the scope is risky, at minimum add arn:${partition}:bedrock-agentcore:::evaluator/Builtin.* to the Resource list (and optionally arn:${partition}:bedrock-agentcore:${region}:${accountId}:evaluator/*) so users with builtin evaluators don't hit the same 400.

Could you also confirm whether you tested the fix with a Builtin evaluator in the online eval config, or only with a custom one? The test plan only mentions ab-test-target-based.test.ts / ab-test-config-bundle.test.ts — knowing which evaluator types those exercise would help.

github-actions · 2026-05-04T23:32:19Z

Coverage Report

Status	Category	Percentage	Covered / Total
🔵	Lines	43.11%	9016 / 20912
🔵	Statements	42.4%	9575 / 22582
🔵	Functions	39.96%	1555 / 3891
🔵	Branches	39.98%	5808 / 14527

Generated in workflow #2425 for commit 5b72ab3 by the Vitest Coverage Report Action

The resource cast was missing invocationUrl, causing a TypeScript error when asserting the AB test's gateway invocation URL was present.

- Include stdout in promote failure message so the JSON error is visible - Add scripts/run-e2e-local.sh to replicate the GitHub Actions e2e workflow locally

When promote is called immediately after resume, the AB test may still be in UPDATING state and reject the STOPPED transition with a 409. Retry up to 6 times with a 10s delay to wait for the transition to complete.

Poll getABTest until executionStatus is no longer UPDATING before attempting to stop. The service rejects updates with 409 while a state transition is in progress (e.g. after resume).

Poll getABTest until executionStatus === 'RUNNING' before issuing the STOPPED transition. The service 409s if the test is still UPDATING (e.g. transitioning from a prior resume).

…figs Previously GetEvaluator was scoped to all project evaluators. Now it's scoped to only the evaluators referenced by the specific online eval configs this AB test uses, handling all three cases: - Custom evaluators: looked up from deployedResources.evaluators - Builtin.* evaluators: ARN constructed from the builtin ID - External ARN references: used as-is

jariy17 · 2026-05-05T15:57:50Z

Addressed in 44bc3f2. The evaluator ARN resolution is now transitive from the online eval configs the AB test references — handling all three cases:

Custom evaluators — looked up from deployedResources.evaluators
Builtin.* evaluators — ARN constructed as arn:${partition}:bedrock-agentcore:::evaluator/Builtin.*
External ARN references — used as-is

The policy is now scoped to only the evaluators this specific AB test transitively references (not all project evaluators), addressing the least-privilege concern.

The e2e tests exercise a custom evaluator (ABTestEvaluator in ab-test-target-based.test.ts). We haven't explicitly tested with Builtin evaluators, but the ARN construction follows the same pattern used in the L3 CDK constructs (AgentCoreOnlineEvaluationConfig.ts:143-161).

Replace the hand-rolled per-resource policy with the canonical policy from https://docs.aws.amazon.com/bedrock-agentcore/latest/devguide/ab-testing-prereqs.html: - Trust policy: add SourceAccount + SourceArn conditions - Permissions: single AgentCoreResources statement scoped to arn:aws:bedrock-agentcore:*:${accountId}:* with ResourceAccount condition - Add missing actions: GetGatewayTarget, ListGatewayTargets, ListConfigurationBundleVersions - CloudWatch logs scoped to account via PrincipalAccount pattern - Remove per-evaluator/per-resource ARN tracking (no longer needed)

…-utils Extract waitForRunningThenStop into promote-utils.ts so it can be unit tested without pulling in React/ink. Add 4 tests covering: immediate RUNNING, polling until RUNNING, timeout throws, and error message content.

…suites

…N/version format in config-bundle e2e test

…follow-up with real bundles)

notgitika · 2026-05-05T22:39:12Z

waitForRunningThenStop fixes the CLI promote race condition, but if there is a TUI path that calls updateABTest({ executionStatus: "STOPPED" }) directly, it has the same bug right?

…1120) (#1126)

jariy17 requested a review from a team May 4, 2026 23:26

github-actions Bot added size/xs PR size: XS agentcore-harness-reviewing AgentCore Harness review in progress labels May 4, 2026

github-actions Bot added size/s PR size: S and removed size/xs PR size: XS labels May 4, 2026

github-actions Bot removed the agentcore-harness-reviewing AgentCore Harness review in progress label May 4, 2026

fix: add invocationUrl to status resource type cast in e2e test

1b441ba

The resource cast was missing invocationUrl, causing a TypeScript error when asserting the AB test's gateway invocation URL was present.

github-actions Bot added size/s PR size: S and removed size/s PR size: S labels May 4, 2026

fix: improve promote error message and add local e2e run script

10704f2

- Include stdout in promote failure message so the JSON error is visible - Add scripts/run-e2e-local.sh to replicate the GitHub Actions e2e workflow locally

github-actions Bot added size/m PR size: M and removed size/s PR size: S labels May 5, 2026

fix: retry promote stop on 409 UPDATING status

15c2d6f

When promote is called immediately after resume, the AB test may still be in UPDATING state and reject the STOPPED transition with a 409. Retry up to 6 times with a 10s delay to wait for the transition to complete.