Skip to content

feat(api,frontend): add GET /api/recommendations/:id/detail + drawer wiring (closes #44)#80

Merged
cristim merged 1 commit intofeat/multicloud-web-frontendfrom
feat/recommendations-detail-endpoint
Apr 27, 2026
Merged

feat(api,frontend): add GET /api/recommendations/:id/detail + drawer wiring (closes #44)#80
cristim merged 1 commit intofeat/multicloud-web-frontendfrom
feat/recommendations-detail-endpoint

Conversation

@cristim
Copy link
Copy Markdown
Member

@cristim cristim commented Apr 25, 2026

Summary

Closes #44.

Adds GET /api/recommendations/:id/detail and wires it into the
Recommendations row-click drawer. The drawer previously rendered only
the columns from the listing payload + a client-side confidence
heuristic + a freshness-derived provenance line. After this change:

  • Backend computes the confidence bucket (1:1 with the previous
    client-side thresholds, moved server-side so future provider tuning
    doesn't need a frontend deploy) and assembles a provenance note
    naming the collector + last-collected timestamp.
  • Frontend fetches the per-id detail on first expand, memoises it
    per-id for the session, renders the server-supplied confidence
    badge + provenance, and renders an inline SVG sparkline of
    usage_history (two polylines, CPU + memory, no chart library).
  • Empty usage_history renders a one-line "Usage history not yet
    available" placeholder rather than a broken empty chart — the
    documented default until the collector starts persisting
    time-series usage per recommendation (known_issues/28).

Decision: empty usage_history vs collector pipeline

usage_history is intentionally [] in this first pass. The
collector pipeline does not yet persist time-series utilisation per
recommendation, and standing up that pipeline is explicitly out of
scope per the issue. Surfacing the missing field as an empty slice
(rather than a 501 or omitting the field) keeps the drawer functional
today and means the day the collector starts populating it, the
frontend automatically picks it up — no follow-up frontend deploy
needed.

Security notes

  • Auth/permission gate runs before id validation so an
    unauthenticated caller can't probe id-shape errors to learn the
    endpoint exists.
  • Account-allow filter runs before the in-memory id lookup, so an
    id that exists but belongs to an account the caller can't see
    returns 404 (existence-disclosure-safe path matching
    handler_accounts.go's account lookup).
  • Frontend encodeURIComponents the id in the URL.
  • No innerHTML; the drawer uses replaceChildren + createElement /
    createElementNS for the sparkline.

Files changed

  • internal/api/types.goRecommendationDetailResponse, UsagePoint
  • internal/api/handler_recommendations.go — new handler + helpers
    (confidenceBucketFor, provenanceNoteFor, providerDisplayName)
  • internal/api/router.go/api/recommendations/{id}/detail route
  • internal/api/handler_recommendations_test.go — 4xx (empty id
    post-auth, unknown id) + 200 happy + degraded provenance + table
    test for the bucket thresholds
  • frontend/src/api/recommendations.tsgetRecommendationDetail
    • types
  • frontend/src/recommendations.ts — drawer fetch wiring + cache +
    inline SVG sparkline; removes the now-superseded client-side
    confidenceBucketFor shim
  • frontend/src/__tests__/recommendations.test.ts — drawer fetches
    once per id, renders backend confidence + provenance, sparkline
    on non-empty usage_history, placeholder on empty, cache reuse on
    re-open

Test plan

  • go test ./internal/api/... — full backend api suite green
  • npx jest — full 1255-test frontend suite green
  • npx tsc --noEmit — type-clean
  • npm run build — webpack production build succeeds
  • Post-merge: navigate the deployed dashboard, expand a
    recommendation row, confirm the new badge/provenance/usage
    placeholder render and that DevTools shows a single
    /api/recommendations/<id>/detail call per first-open

Summary by CodeRabbit

  • New Features

    • Recommendation details now display server-provided confidence levels and provenance information.
    • Added visual usage history with CPU and memory sparklines when historical data is available.
    • Improved loading states and error handling in the details drawer.
  • Tests

    • Added comprehensive test coverage for recommendation details endpoint and drawer functionality.

…+ drawer integration

Implements issue #44.

Backend (internal/api/):
- New `RecommendationDetailResponse` (id, usage_history, confidence_bucket,
  provenance_note) + `UsagePoint` types in types.go.
- New `getRecommendationDetail` handler returning the per-id payload.
  Confidence bucket is computed server-side from savings × count
  (1:1 with the previous client-side heuristic — moved to one place
  so future provider-specific tuning doesn't need a frontend deploy).
  Provenance note names the collector + last-collected timestamp,
  degrading to a no-window form when the freshness lookup fails.
- Auth gate runs BEFORE id validation so an unauthenticated caller
  can't probe id-shape errors to learn the endpoint exists.
- Account-allow filter runs BEFORE the id lookup, so an id that
  exists but belongs to an account the caller can't see returns 404
  (existence-disclosure-safe path matching handler_accounts.go).
- Route registered as `PathPrefix:/api/recommendations/` +
  `PathSuffix:/detail` mirroring the /api/plans/{id}/purchases
  pattern; extractParams strips both ends to recover :id.

Backend tests cover empty id (post-auth 400), unknown id (404),
known id (200 with the expected shape + bucket), and the freshness-
unavailable degraded provenance path. Plus a table-test for
`confidenceBucketFor` covering the threshold edges.

Frontend (frontend/src/):
- New `getRecommendationDetail(id)` + `RecommendationDetail` /
  `RecommendationUsagePoint` types in api/recommendations.ts.
- Recommendations drawer (recommendations.ts:openDetailDrawer) now
  fetches the detail on first expand, memoised per-id via
  `detailFetchCache` so re-opens within the session don't re-hit
  the network. Loading placeholders render immediately; resolution
  fills in the server-supplied confidence badge + provenance note,
  and renders an inline SVG sparkline of usage_history (two
  polylines, CPU + memory, no axes/legend — no chart-library
  dependency).
- Empty `usage_history` renders a one-line "Usage history not yet
  available" placeholder rather than a broken empty chart. This is
  the documented default until the collector starts persisting
  time-series usage per recommendation (known_issues/28).
- Detail-fetch failure falls back to an Unknown badge + "details
  temporarily unavailable" provenance so a transient backend issue
  doesn't blank the drawer.
- Removes the now-superseded client-side `confidenceBucketFor` shim
  and the `formatRelativeTime` import that only the freshness-
  derived provenance line used.

Frontend tests cover: drawer fetches once per id with the expected
url, renders backend confidence + provenance, renders the sparkline
when usage_history is non-empty, renders the placeholder when empty,
and reuses the cache on repeated opens of the same drawer.

Out of scope (deferred to a follow-up): collector-side persistence
of per-recommendation time-series usage (the empty-slice path lets
the drawer ship today and pick up real data the moment the
collector starts populating it). Cross-recommendation comparison UI
(per the issue's out-of-scope section).

Closes #44
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 25, 2026

📝 Walkthrough

Walkthrough

A new /api/recommendations/{id}/detail endpoint is introduced to serve per-recommendation usage history, confidence bucket, and provenance information. The backend handler validates the recommendation id, filters results by account visibility, computes confidence buckets from savings/count thresholds, and derives provenance notes from freshness timestamps. The frontend calls this endpoint on drawer open, memoizes results per id, and conditionally renders sparklines when usage history data is available.

Changes

Cohort / File(s) Summary
Backend Type Definitions
internal/api/types.go
Adds UsagePoint and RecommendationDetailResponse types with timestamp, CPU/memory percentages, confidence bucket, and provenance note fields.
Backend Route Registration
internal/api/router.go
Registers new GET /api/recommendations/{id}/detail route with parameter extraction and dispatch to handler.
Backend Detail Handler
internal/api/handler_recommendations.go
Implements getRecommendationDetail handler that validates id, filters cached recommendations by account visibility, computes confidence bucket from savings/count thresholds, and derives RFC3339-formatted provenance note from freshness timestamp.
Backend Handler Tests
internal/api/handler_recommendations_test.go
Adds unit tests for id validation (400 errors), not-found behavior (404), successful detail response shape, and table-driven tests for confidenceBucketFor logic across savings/count combinations.
Frontend API Layer
frontend/src/api/recommendations.ts
Defines RecommendationUsagePoint and RecommendationDetail TypeScript types and exports getRecommendationDetail(id) function to fetch the detail endpoint with path encoding.
Frontend Drawer Implementation
frontend/src/recommendations.ts
Replaces client-side confidence heuristic with server-provided confidence_bucket, fetches usage history from new endpoint with memoization per id, renders inline CPU/memory sparkline when history is non-empty, shows "Usage history not yet available." placeholder when empty, and exports clearRecommendationDetailCache() for test isolation.
Frontend Drawer Tests
frontend/src/__tests__/recommendations.test.ts
Adds drawer-focused test block mocking getRecommendationDetail, validates single fetch-per-id caching across close/reopen cycles, asserts confidence/provenance rendering, verifies "Usage history not yet available." placeholder for empty history, and confirms inline SVG sparkline with two paths when history is present.

Sequence Diagram

sequenceDiagram
    actor User
    participant Frontend as Frontend<br/>(recommendations.ts)
    participant API as API Client<br/>(api/recommendations.ts)
    participant Handler as Backend Handler<br/>(handler_recommendations.go)
    participant Cache as Recommendation<br/>Cache & Freshness

    User->>Frontend: Click recommendation row to expand drawer
    Frontend->>Frontend: Check memoization cache
    alt Cache miss
        Frontend->>API: getRecommendationDetail(id)
        API->>Handler: GET /api/recommendations/{id}/detail
        Handler->>Cache: ListRecommendations()
        Cache-->>Handler: [recommendations with savings/count]
        Handler->>Handler: Filter by account visibility
        Handler->>Handler: Match recommendation by id
        Handler->>Cache: GetRecommendationsFreshness()
        Cache-->>Handler: freshness timestamp
        Handler->>Handler: Compute confidence_bucket from savings/count
        Handler->>Handler: Format provenance_note from freshness
        Handler-->>API: RecommendationDetailResponse{<br/>  id, usage_history[],<br/>  confidence_bucket,<br/>  provenance_note<br/>}
        API-->>Frontend: Typed RecommendationDetail object
        Frontend->>Frontend: Store in memoization cache
    else Cache hit
        Frontend->>Frontend: Use cached detail
    end
    Frontend->>Frontend: Render confidence badge from detail.confidence_bucket
    Frontend->>Frontend: Render provenance_note text
    alt usage_history is non-empty
        Frontend->>Frontend: Render CPU/memory sparkline (inline SVG)
    else usage_history is empty
        Frontend->>Frontend: Render "Usage history not yet available." placeholder
    end
    Frontend-->>User: Display populated drawer
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐰 A sparkle of confidence in every row,
Bunny hops through detail endpoints below,
Histograms dancing, provenance clear,
No more guessing — the whole truth is here!
✨ Cache it, memoize it, draw with care!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'feat(api,frontend): add GET /api/recommendations/:id/detail + drawer wiring' clearly and concisely describes the main changes (new endpoint and frontend integration), matching the primary objective in the PR.
Linked Issues check ✅ Passed All objectives from issue #44 are met: endpoint returns id/usage_history/confidence_bucket/provenance_note [#44], confidence computed server-side [#44], usage history rendered in drawer [#44], memoization implemented [#44], and frontend integration complete [#44].
Out of Scope Changes check ✅ Passed All changes are directly aligned with issue #44 requirements: backend endpoint and types, frontend drawer wiring, and supporting tests. No out-of-scope additions like cross-recommendation UI or provider tuning are present.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/recommendations-detail-endpoint

Comment @coderabbitai help to get the list of available commands and usage tips.

@cristim
Copy link
Copy Markdown
Member Author

cristim commented Apr 25, 2026

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 25, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@cristim
Copy link
Copy Markdown
Member Author

cristim commented Apr 27, 2026

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 27, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
frontend/src/__tests__/recommendations.test.ts (1)

509-510: Consider a flushPromises utility for more robust async waiting.

The double await Promise.resolve() pattern works but can become fragile if the implementation adds additional async layers. A utility like const flushPromises = () => new Promise(r => setTimeout(r, 0)) or @testing-library/react's waitFor would be more resilient to implementation changes.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@frontend/src/__tests__/recommendations.test.ts` around lines 509 - 510,
Replace the fragile double `await Promise.resolve()` pattern with a resilient
async flush helper: add a `const flushPromises = () => new Promise(r =>
setTimeout(r, 0))` (or import `waitFor` from `@testing-library/react`) and
replace the two `await Promise.resolve()` calls with a single `await
flushPromises()` (or `await waitFor(() => {/* assertion-ready check */})`);
update the test in `recommendations.test.ts` to use `flushPromises` (or
`waitFor`) so it properly waits for queued micro/macro tasks before assertions.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@frontend/src/__tests__/recommendations.test.ts`:
- Around line 509-510: Replace the fragile double `await Promise.resolve()`
pattern with a resilient async flush helper: add a `const flushPromises = () =>
new Promise(r => setTimeout(r, 0))` (or import `waitFor` from
`@testing-library/react`) and replace the two `await Promise.resolve()` calls
with a single `await flushPromises()` (or `await waitFor(() => {/*
assertion-ready check */})`); update the test in `recommendations.test.ts` to
use `flushPromises` (or `waitFor`) so it properly waits for queued micro/macro
tasks before assertions.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 3cc808ea-27ef-4486-902d-db58901fa099

📥 Commits

Reviewing files that changed from the base of the PR and between 006b9d3 and 2ff8392.

📒 Files selected for processing (7)
  • frontend/src/__tests__/recommendations.test.ts
  • frontend/src/api/recommendations.ts
  • frontend/src/recommendations.ts
  • internal/api/handler_recommendations.go
  • internal/api/handler_recommendations_test.go
  • internal/api/router.go
  • internal/api/types.go

@cristim cristim merged commit 5520317 into feat/multicloud-web-frontend Apr 27, 2026
3 checks passed
cristim added a commit that referenced this pull request Apr 28, 2026
… hotfix) (#172)

Deploy CI for AWS Lambda / Azure Container Apps / GCP Cloud Run started
failing on the merge commit of #160 (346d168) with:

  recommendations.ts(7,50): error TS6133: 'formatRelativeTime' is declared
  but its value is never read.

Root cause: my Bundle B commit imported formatRelativeTime because the
pre-rebase base had a detail-drawer provenance line that used it (see the
prior `getRecommendationsFreshness().then(f => formatRelativeTime(...))`
pattern). During the rebase onto the new base, that provenance code had
been replaced (#80) with the `getRecommendationDetail` flow which doesn't
use formatRelativeTime — but the import line survived the rebase merge
unchanged. Net result: an orphaned import that ts-loader (with
noUnusedLocals: true in tsconfig) flags as an error.

The local pre-commit "Build frontend" hook would have caught this had it
run against the rebased commit, but pre-commit hooks don't re-run on git
rebase. A subsequent fix-commit on the branch (77ac16b) had already
removed the import locally, but the GitHub merge button raced with that
push and merged the older HEAD (034ea00), so the fix never landed.

The follow-up `chore/recommendations` issue tracking what to put in CI to
prevent this class of regression: it would catch any commit on `main` /
`feat/multicloud-web-frontend` that fails `npm run build`. (Today's deploy
workflows do effectively that, but they only run on push, after the
breaking commit is already in.)
@cristim cristim added triaged Item has been triaged priority/p2 Backlog-worthy severity/medium Moderate harm urgency/this-sprint Within the current sprint impact/many Affects most users effort/l Weeks type/feat New capability labels Apr 28, 2026
@cristim cristim deleted the feat/recommendations-detail-endpoint branch April 29, 2026 10:08
cristim added a commit that referenced this pull request Apr 29, 2026
The new pre-commit CI gate added by this PR catches a latent issue on
the base branch: `recommendations.ts` imports `formatRelativeTime` but
no longer uses it (a rebase orphan from #160#80). With
noUnusedLocals=true in tsconfig, ts-loader fails the production
webpack build and breaks Jest test suites that import the module.

Same fix as #172 on main; cherry-picking equivalent change here so
the new pre-commit gate this PR introduces actually passes when it
first runs against feat/multicloud-web-frontend.
cristim added a commit that referenced this pull request Apr 29, 2026
… pre-commit + multi-module govulncheck (#105)

* fix(security): supply-chain hardening — Docker SHA pinning + required pre-commit gates + multi-module govulncheck

Closes 5 HIGH findings from the security review:

H10 (lockfile discipline): audit confirmed CI does not run `npm install`
anywhere — only `npm audit --audit-level=high` (already in ci.yml). The
Dockerfile uses `npm ci` correctly. No code change needed.

H11 (Dockerfile base images not SHA-pinned): replaced the three TODO-
flagged tag-only references with image@sha256:<digest> pins:
  - golang:1.25.4-alpine3.21@sha256:3289aac2...
  - node:24-alpine@sha256:d1b3b4da...
  - alpine:3.21.3@sha256:a8560b36...
A registry tag mutation can no longer poison the build. Refresh path
documented in-comment.

H12 (pre-commit hooks silently skipping):
  - Removed the `command -v trivy ... || echo "skipping..."` fallback
    on the trivy-config hook. Devs without trivy installed now fail
    the hook (as they should). CI installs trivy via the new
    pre-commit workflow, so PRs are always scanned.
  - Added .github/workflows/pre-commit.yml that runs `pre-commit run
    --all-files` on every PR + push to main/feat. Installs gosec,
    gocyclo, trivy, git-secrets, hadolint, then runs all hooks. This
    is stricter than the local hook (all files vs staged only) on
    purpose: catches drift where a hook change exposes a pre-existing
    issue that wasn't previously gated.
  - Added .trivyignore documenting the 9 pre-existing accepted trivy
    findings (CloudFront WAF, ALB public-by-design, ALB egress, S3/SNS
    default-key encryption, public subnets for NAT/ALB, Azure Function
    HTTPS-enforce, Azure storage network rules) with per-finding
    justifications. Each is intentional under the current threat
    model; re-evaluate when the underlying terraform changes.

H13 (no govulncheck in CI): the existing govulncheck step in ci.yml
only ran `./...` from the repo root, which silently missed the four
submodules (pkg, providers/aws, providers/azure, providers/gcp).
Replaced with a loop that walks every module independently and fails
on any HIGH/CRITICAL CVE in any of them.

H14 (.env.example + resolver.go pre-commit exclusion):
  - Added .env.example: a documented template of every os.Getenv-
    consumed env var with placeholder values and per-section
    explanations. Devs copy to .env.local (already gitignored) and
    fill in.
  - Removed internal/credentials/resolver.go from the
    detect-private-key exclusion list. Audit (grep) found zero
    private-key-shaped patterns in that file — the exclusion was a
    historical artifact. Tightening it costs nothing and prevents a
    future genuine private key from sneaking in.

* ci(pre-commit): install terraform + tflint in workflow

The pre-commit workflow added in this PR runs every hook in
.pre-commit-config.yaml on the runner, but missed two binaries that
three of those hooks depend on:

  Hook              | Binary needed     | Previous result
  ------------------|-------------------|----------------
  terraform_fmt     | terraform         | exit 127 (cmd not found)
  terraform_validate| terraform         | exit 127
  terraform_tflint  | tflint            | exit 127

Add hashicorp/setup-terraform@v3 (pinned to 1.9.8 so behaviour
matches the version Terraform Cloud uses for our state, and so a
silent provider-CLI bump can't change apply output) and a tflint
install step. terraform_wrapper is disabled because the pre-commit
hook invokes the terraform binary directly and the wrapper would
double-stringify exit codes.

* chore(security): allowlist test-fixture account IDs in .gitallowed

git-secrets --register-aws adds a 12-digit account-ID regex to its
prohibited-patterns list. Our test fixtures use obvious placeholders
(123456789012, all-same-digit blocks like 111111111111, countdown
patterns like 999888777666) which trigger the scanner across ~20
test files even though no real account ID is being committed.

Add .gitallowed at repo root with patterns scoped tightly to those
specific placeholder values — not a wildcard 12-digit relax — so the
scanner still flags real account IDs that leak in elsewhere.

The file includes a top-of-file warning that real account IDs must
never be added: the right response to a real leak is rotation, not
silencing the scanner.

* docs(markdown): fix MD040/MD060/MD032 markdownlint violations

Pre-commit's markdownlint hook was failing on 145 violations across 8
files, all pre-existing — invisible until the new pre-commit CI gate
turned them into a hard error.

Three rule classes, three fix strategies:

MD060 (table-column-style — 122 violations): markdownlint's default
"consistent" mode infers the style from the first table it sees; if a
separator row happens to look "compact" (no spaces around the dashes),
every aligned table downstream is flagged. Pin the style to
"leading_and_trailing" in .markdownlint.yaml — the convention every
README in the repo already uses, and the only one GitHub renders
consistently across both the rich UI and raw-blob view. No README
content needed touching.

MD040 (fenced-code-language — 9 violations): assign explicit "text"
language tags to fenced blocks that aren't a real language —
directory trees, ASCII architecture diagrams, commit-message
templates, CloudWatch Logs Insights queries (no recognized highlighter
exists for the CWLI dialect). "text" disables highlighting cleanly
without faking syntax that doesn't apply.

MD032 (blanks-around-lists — 14 violations, all in
known_issues/09_aws_provider.md): autofixed by markdownlint --fix.
Applied verbatim.

After the sweep `markdownlint '**/*.md' --ignore node_modules --ignore
.git` exits clean.

* ci(pre-commit): bump terraform pin to 1.10.5 to satisfy module constraints

Every terraform/environments/*/main.tf declares
`required_version = ">= 1.10.0"`, but the previous pin of 1.9.8 made
terraform_validate fire `terraform init` against all of them and abort
with "Unsupported Terraform Core version" before validate ran.

1.10.5 is the latest stable in the 1.10.x line and satisfies the
existing constraint without forcing a 1.11 jump (which would invite
provider-version churn we don't want bundled into a CI-tooling fix).

* refactor(terraform): split 5 modules to standard structure for tflint

Pre-commit's terraform_tflint hook was failing with 39 warnings across
five modules — all pre-existing structural debt that the new pre-commit
CI gate exposed. The fix shape is the same per module: extract
variables, declare a version contract, keep main.tf for resources
only.

Per-module breakdown:

  compute/azure/cleanup-function/  (was 17 issues)
    Single-file module — moved 11 variable blocks to variables.tf,
    4 output blocks to outputs.tf, added versions.tf pinned to
    azurerm "~> 4.0" (the resource bodies use 4.x-only schemas).
    main.tf now contains only the seven azurerm_* resources.

  registry/azure/  (was 16 issues)
    Same shape — 7 variables (including the orphan
    container_app_identity_principal_id declared mid-file at line
    124, easy to miss) extracted to variables.tf; 5 outputs to
    outputs.tf; versions.tf added pinned to "~> 4.0" for the same
    schema reason. main.tf is now just the three azurerm_*
    resources.

  monitoring/azure/  (was 2 issues)
    Already had variables.tf + outputs.tf split; just missing the
    terraform { } contract. Added versions.tf pinned to "~> 4.0"
    matching this module's previously-committed lock file. Marked
    slack_action_group_id output as sensitive — its value derives
    from the slack_webhook_url variable, which is sensitive.

  monitoring/gcp/  (was 3 issues)
    Same as monitoring/azure but for the google provider, plus
    removed the unused `region` variable from variables.tf — grep
    confirms it isn't referenced anywhere in the module body, and
    the module isn't currently instantiated by any environment, so
    no caller needs to be updated. Marked
    slack_notification_channel_id output as sensitive.

  email/azure/  (was 1 issue)
    Already had a terraform block declaring azurerm but used a
    null_resource for SMTP credential fetching without declaring
    the null provider. Added it pinned to "~> 3.2".

After the sweep, tflint exits 0 across all five previously-failing
modules and terraform fmt -recursive is clean.

Side effects:

* Removed stale .terraform.lock.hcl files for the three modules
  whose required-provider constraints I bumped (cleanup-function,
  monitoring/azure, registry/azure). The lock files were pinning
  azurerm 4.61.0 with no surrounding constraint; they will
  regenerate cleanly on next terraform init under the new "~> 4.0"
  pin.

* terraform_validate exposed a separate, pre-existing class of
  bugs in two of the orphan modules (cleanup-function and
  registry/azure): `dynamic` blocks wrapped around scalar
  attributes (e.g. `dynamic "vnet_route_all_enabled"` around what
  is a boolean attribute on `site_config`, not a nested block).
  These would fail validate against any azurerm version. Excluded
  those two modules from the terraform_validate hook in
  .pre-commit-config.yaml with an explicit comment pointing at the
  follow-up cleanup. The other three modules (monitoring/azure,
  monitoring/gcp, email/azure) validate cleanly.

* chore(terraform): regenerate .terraform.lock.hcl for the 3 modules with new pin

The previous commit removed stale lock files for cleanup-function,
monitoring/azure, and registry/azure (they pinned azurerm 4.61.0
without a matching version constraint, then mismatched once `~> 4.0`
was declared in versions.tf). Running terraform_validate in CI
re-creates those locks on every run and pre-commit then flags the
hook as "files were modified" — which fails the build even though
validate itself succeeded everywhere.

Regenerate the locks locally with `terraform init -upgrade` so the
files are present on the branch and CI's init is a no-op.

All three locks land at azurerm 4.70.0 (current latest in the 4.x
series); the constraint `~> 4.0` admits the next 4.x patch without
re-locking.

* ci(pre-commit): skip terraform_validate in CI to unblock workflow

terraform_validate calls `terraform init` per module which creates
.terraform.lock.hcl files. Those files are gitignored, so on a fresh
CI checkout they don't exist; init creates them and the pre-commit
hook reports "files were modified by this hook" → exit 1.

Local pre-commit runs work fine because lock files persist between
invocations. terraform_fmt and terraform_tflint still run in CI and
catch the syntax/style issues. The deeper schema validation runs in
`terraform plan` during deploy workflows, so dropping the gate from
the pre-commit CI workflow doesn't lose coverage.

* fix(env): correct .env.example defaults to match runtime support

Addresses CodeRabbit findings #1, #2, #3 from PR #105's pass-2 review.

#1: Reorder CORS_ALLOWED_ORIGIN before DASHBOARD_URL so dotenv-linter's
    alphabetical-key check is satisfied within the "Optional: web
    frontend / CORS / dashboard" section.

#2: Stale finding (CodeRabbit reviewed PR head 25e0835 which was
    behind the base branch). After rebase onto feat/multicloud-web-frontend,
    commit 83fa329 ("fix(security): credential encryption key — load
    real key on Azure/GCP, hard-fail when missing", #93) already wires
    the CREDENTIAL_ENCRYPTION_ALLOW_DEV_KEY=1 opt-in into
    internal/credentials/cipher.go: loadKey() returns ErrNoKey unless
    the flag is set, exactly the security-correct posture this PR's
    supply-chain hardening calls for. The .env.example entry is now
    accurate as-is, no code change needed.

#3: Default SECRET_PROVIDER=env was unsupported by the email factory's
    switch (internal/email/factory.go) — only aws|gcp|azure are valid
    there, and email init runs unconditionally at app startup, so a
    fresh local dev with the previous default would crash before
    serving any traffic. Switched the default to `aws` (matches the
    factory's own backward-compat default when SECRET_PROVIDER is
    unset) and dropped `env` from the comment's value list. Picked
    option (a) — config-only — over (b) (add an `env` branch to the
    email factory) because adding a stub email sender is feature work
    that doesn't belong in a supply-chain hardening PR; the existing
    comment also doesn't document any local dev path that would
    actually exercise email send.

* chore(ci): pin govulncheck and pre-commit tool installs

Addresses CodeRabbit findings #4 and #5 from PR #105's pass-2 review.

#4: ci.yml `govulncheck@latest` → `@v1.1.4`. The vulnerability scanner
    is a hard CI gate; a silent upstream bump could change verdicts
    between PRs without an intentional review item in this repo.
    Pinning makes upgrades a deliberate commit, not a drift.

#5: .github/workflows/pre-commit.yml — replace every floating install
    target with a release-tagged equivalent so CI behaviour can't
    silently shift if upstream rewrites a `master` install script or
    cuts a breaking @latest release:
      - tflint               master → v0.55.0 (curl now -fsSL)
      - gosec                @latest → @v2.22.4 (matches ci.yml's
                              securego/gosec action pin)
      - gocyclo              @latest → @v0.6.0 (matches ci.yml)
      - Trivy                main script → -b /usr/local/bin v0.58.0
      - git-secrets          master → tag 1.3.0; assert at least one
                              pattern was registered (without the
                              assert, registration failure produces a
                              patternless scanner that exits 0 silently)
      - hadolint             releases/latest → removed (the
                              hadolint-docker pre-commit hook already
                              runs the official v2.14.0 image; the
                              host install was dead code AND a
                              supply-chain hole)
      - pre-commit           pip → pre-commit==4.0.1
      - hashicorp/setup-terraform  v3 → v4 (matches ci.yml so the two
                              workflows resolve to the same Terraform
                              binary)

Each step now also `set -euo pipefail`'s where it pipes downloaded
content to a shell, so transport errors fail the install loudly
instead of feeding an HTML 404 page to bash.

Updated the .pre-commit-config.yaml trivy-config comment to point at
the new workflow location (.github/workflows/pre-commit.yml) where
trivy v0.58.0 is now installed; the old comment pointed at
ci.yml's trivy-action step which never carried this PR's pin.

* chore(terraform): drop unused schedule variable + align null provider pin

Addresses CodeRabbit Actionable #6 and Nitpick #1 from PR #105's
pass-2 review.

#6 (cleanup-function var.schedule unused):
   `terraform/modules/compute/azure/cleanup-function/variables.tf`
   declared a `schedule` variable documented as "CRON schedule (NCRONTAB
   format)" with a CRON-shaped default ("0 2 * * *"), but `main.tf`'s
   `azurerm_logic_app_trigger_recurrence.cleanup` hardcodes
   `frequency = "Day"` / `interval = 1`, which is the only schedule
   shape Azure Logic App recurrence triggers accept (NCRONTAB is for
   Functions timer triggers, not Logic Apps). The variable was never
   wired, the documentation string was wrong, and the only consumer
   was an `output "schedule"` that just echoed `var.schedule` back.

   Cleanest fix: delete both the variable and the output. The module
   was excluded from terraform_validate in PR #105 as part of the
   orphan-module set; PR #154 (merged onto feat/multicloud-web-frontend
   on 2026-04-28) repaired the broken `dynamic`-around-scalar HCL but
   left this unused-variable separately. Wiring schedule through the
   Logic App trigger (the original intent) would require introducing
   frequency+interval inputs and a NCRONTAB→frequency translation,
   which is feature work that doesn't belong in a supply-chain
   hardening PR.

Nitpick #1 (null provider version split):
   `terraform/modules/email/azure/main.tf` pinned the null provider
   at `~> 3.2` while `terraform/environments/azure/main.tf` was at
   `~> 3.0`. The lockfile already resolved to 3.2.4, so the env-file
   constraint was effectively misleading rather than restrictive.
   Bumped the env file to `~> 3.2` so the constraint matches the
   resolved version and matches the module that pulls null in
   transitively.

Nitpick #2 (azurerm `~> 4.0` vs root `~> 3.0` split in
cleanup-function/registry/monitoring orphan modules) is intentional
and tracked in follow-up issue #147 — see the PR comment thread for
the link. Not changed here.

* fix(ci): bump trivy pin from v0.58.0 to v0.69.3

Follow-up to 8e07b1f. The trivy install.sh script downloads tarballs
from GitHub Releases, but several mid-range trivy tags (including
v0.58.0) only publish git tags without uploading release assets, so
the install bails silently after the version-detection log line:

    aquasecurity/trivy info found version: 0.58.0 for v0.58.0/Linux/64bit
    Process completed with exit code 1.

v0.69.3 is the latest release with published assets. Verified via
`gh api repos/aquasecurity/trivy/releases/tags/v0.69.3` — ships
`trivy_0.69.3_Linux-64bit.tar.gz` plus signature files.

Also dropped `-u` from the install step's `set -euo pipefail`. The
trivy install.sh references unset env vars internally; running under
`bash -e` with `-u` propagated would abort early. `-e` plus
`pipefail` is sufficient to fail on real install errors.

* fix(frontend): drop unused formatRelativeTime import

The new pre-commit CI gate added by this PR catches a latent issue on
the base branch: `recommendations.ts` imports `formatRelativeTime` but
no longer uses it (a rebase orphan from #160#80). With
noUnusedLocals=true in tsconfig, ts-loader fails the production
webpack build and breaks Jest test suites that import the module.

Same fix as #172 on main; cherry-picking equivalent change here so
the new pre-commit gate this PR introduces actually passes when it
first runs against feat/multicloud-web-frontend.

* fix(security): annotate gosec false positives in retry+audit

The new pre-commit gate runs gosec across the whole tree. Two
findings on pre-existing code are false positives in context:

- pkg/retry/exponential.go G404: math/rand/v2 used for retry-backoff
  jitter. Non-cryptographic — crypto/rand would add cost for zero
  security benefit; jitter only smears retry storms.

- pkg/common/audit.go G302: 0644 perms on the JSONL audit log are
  intentional. Ops tooling reconciles the file against
  purchase_history; restricting to 0600 would break that workflow
  without meaningful protection (file lives under run-owned cwd).

Both annotated with #nosec + rationale rather than excluded
globally, so a future genuine G404/G302 elsewhere is still caught.
Brings the new pre-commit gate from red to green without weakening
the security posture.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

effort/l Weeks impact/many Affects most users priority/p2 Backlog-worthy severity/medium Moderate harm triaged Item has been triaged type/feat New capability urgency/this-sprint Within the current sprint

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant