Skip to content

perf(azure): batched SKU catalogue lookup eliminates N+1 in recommendation converters#81

Merged
cristim merged 1 commit intofeat/multicloud-web-frontendfrom
perf/azure-batched-sku-lookup
Apr 27, 2026
Merged

perf(azure): batched SKU catalogue lookup eliminates N+1 in recommendation converters#81
cristim merged 1 commit intofeat/multicloud-web-frontendfrom
perf/azure-batched-sku-lookup

Conversation

@cristim
Copy link
Copy Markdown
Member

@cristim cristim commented Apr 25, 2026

Summary

Adds a lazy, single-shot SKU catalogue lookup to three of the four Azure recommendation converters (cache, cosmos, database) so the previously-deferred Details fields populate without N+1 SDK calls.

  • Cache: CacheDetails.Shards from armredis.Properties.ShardCount.
  • Cosmos: NoSQLDetails.APIType (sql / mongodb / cassandra / gremlin / table) from the dominant Cosmos account Kind + Capabilities in the subscription.
  • Database: DatabaseDetails.EngineVersion from armsql.CapabilitiesClient.ListByLocation traversal.

Each converter calls a per-service cachedSKULookup (or cachedAPIType on cosmos). The catalogue is fetched ONCE per client lifetime via sync.Once; many converter calls in the same GetRecommendations run hit the in-memory map. A failed catalogue fetch leaves the cache nil and converters fall back to the previous empty-Details behaviour with a one-time WARN log — the conversion itself never fails.

Compute scoped back: common.ComputeDetails (in pkg/common/types.go) currently exposes only InstanceType / Platform / Tenancy / Scope. The two enrichments compute would supply (vCPU, MemoryGB) have no struct field to write into. Adding fields to a shared pkg/common type would touch all 3 cloud providers, the frontend, and the matchers — out of scope for this perf change. Tracked as a narrower follow-up in known_issues/10_azure_provider.md.

Test plan

New per-service tests in the existing *_test.go:

  • _PopulatesShardsFromSKUCache / _PopulatesAPIType (5 sub-cases) / _PopulatesEngineVersion — cache hit populates the previously-deferred field.
  • _PagerErrorFallsBack / _CapabilitiesErrorFallsBack — fetch error leaves the field empty and conversion still succeeds (graceful-degradation contract).
  • _CachedSKULookup_FetchedOnce / _CachedAPIType_FetchedOnce — many converter calls share a single catalogue fetch (the N+1 invariant pinned in tests).
  • _AmbiguousAPIType — multi-API-type Cosmos subscription leaves APIType empty rather than guessing.
  • Pre-existing _PopulatesAllFields tests now inject empty mock pagers so they don't hit the real Azure API on first lookup.
  • cd providers/azure && go test ./... — all green.
  • go test ./... (root) — all green.
  • go vet ./... — clean.
  • Pre-commit hooks (gofmt, gocyclo, gosec, etc.) — all green.

Closes #49 for cache / cosmos / database. Compute follow-up filed as a narrower issue in the known-issues doc.

…ation converters

The four Azure recommendation converters (compute, database, cache,
cosmosdb) previously left SKU-derived `Details` fields empty because
populating them inline would have triggered an N+1 SDK call per
recommendation. This change wires three of them to a lazily-cached,
single-shot SKU catalogue lookup, gated by `sync.Once` and held for the
client's lifetime.

Per-service:

- **Cache** (`services/cache`): caches `Properties.ShardCount` per SKU
  from `armredis.Client.NewListBySubscriptionPager`. Converter populates
  `CacheDetails.Shards` for Premium-tier clustered caches.
- **Cosmos** (`services/cosmosdb`): caches the dominant Cosmos APIType
  (mongodb / cassandra / gremlin / table / sql) across the
  subscription's accounts via `armcosmos.DatabaseAccountsClient.NewListPager`,
  mapping `account.Kind` + `Capabilities`. Converter populates
  `NoSQLDetails.APIType`. Multi-API-type subscriptions leave APIType
  empty rather than guessing.
- **Database** (`services/database`): caches engine version per SKU
  from `armsql.CapabilitiesClient.ListByLocation`, traversing
  ServerVersion → Editions → ServiceLevelObjectives. Converter
  populates `DatabaseDetails.EngineVersion`.

Each converter calls a per-service `cachedSKULookup` (or `cachedAPIType`
on cosmos). The catalogue is fetched ONCE per client lifetime; many
converter calls in the same `GetRecommendations` run hit the in-memory
map. A failed catalogue fetch leaves the cache nil and converters fall
back to the previous empty-Details behaviour with a one-time WARN log —
the conversion itself never fails, preserving the
graceful-degradation contract.

**Compute scoped back**: `common.ComputeDetails` (in `pkg/common/types.go`)
exposes only InstanceType/Platform/Tenancy/Scope. The two enrichments
the SKU catalogue would supply for compute are vCPU and MemoryGB —
neither has a struct field to write into. Adding fields to the shared
type would touch all 3 cloud providers' converters, the frontend, and
the matchers, well outside this perf change's scope. Documented as a
narrower follow-up in `known_issues/10_azure_provider.md`.

Tests per service in the existing `*_test.go`:

- `_PopulatesShardsFromSKUCache` / `_PopulatesAPIType` /
  `_PopulatesEngineVersion` — cache hit populates the previously-deferred
  field.
- `_PagerErrorFallsBack` / `_CapabilitiesErrorFallsBack` — fetch error
  leaves the field empty and conversion still succeeds.
- `_FetchedOnce` — many converter calls share a single catalogue fetch
  (the N+1 invariant).
- The pre-existing `_PopulatesAllFields` tests now inject empty mock
  pagers so they don't hit the real Azure API on first lookup.

Refs #49
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 25, 2026

Warning

Rate limit exceeded

@cristim has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 31 minutes and 52 seconds before requesting another review.

Your organization is not enrolled in usage-based pricing. Contact your admin to enable usage-based pricing to continue reviews beyond the rate limit, or try again in 31 minutes and 52 seconds.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 6d7c96a5-6222-434c-83cc-0581b1bff0ec

📥 Commits

Reviewing files that changed from the base of the PR and between 96dccb8 and b3fe6e2.

📒 Files selected for processing (8)
  • known_issues/10_azure_provider.md
  • providers/azure/services/cache/client.go
  • providers/azure/services/cache/client_test.go
  • providers/azure/services/compute/client.go
  • providers/azure/services/cosmosdb/client.go
  • providers/azure/services/cosmosdb/client_test.go
  • providers/azure/services/database/client.go
  • providers/azure/services/database/client_test.go
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch perf/azure-batched-sku-lookup

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@cristim
Copy link
Copy Markdown
Member Author

cristim commented Apr 25, 2026

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 25, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@cristim
Copy link
Copy Markdown
Member Author

cristim commented Apr 27, 2026

Two follow-up GitHub issues filed for the deferrals this PR explicitly scopes back:

Both reference back here and propose mirroring the cachedAPIType / cachedSKULookup patterns this PR establishes.

@cristim cristim merged commit 5e3b54d into feat/multicloud-web-frontend Apr 27, 2026
3 checks passed
@cristim cristim added triaged Item has been triaged priority/p3 Polish / idea / may never ship severity/low Minor harm urgency/this-quarter Within the quarter impact/few Limited audience effort/l Weeks type/feat New capability labels Apr 28, 2026
@cristim cristim deleted the perf/azure-batched-sku-lookup branch April 29, 2026 10:08
cristim added a commit that referenced this pull request Apr 30, 2026
Adds optional VCPU (int) and MemoryGB (float64) fields to
common.ComputeDetails so per-provider catalogue lookups can enrich
compute recommendations with sizing data. Both fields use json
omitempty — converters that don't yet wire a catalogue leave them at
the zero value and the API payload stays clean.

GetDetailDescription appends " (N vCPU / M GB)" when both values are
> 0, otherwise returns the existing "platform/tenancy" form. Format
uses %g so 16 GB renders as "16 GB" (not "16.000000 GB") while 0.5 GB
SKUs render as "0.5 GB".

Scope decision — schema-only foundation:

- Azure: PR #81 (perf/azure-batched-sku-lookup) introduces the
  cachedSKULookup helper for cache/cosmos/database. Compute was
  scoped back from #81 because these fields didn't exist. With the
  schema in place, a follow-up PR can wire compute on top of #81 once
  it merges (filed as a separate issue post-merge to avoid an
  unmerged-PR dependency in this changeset).
- AWS: ce.EC2InstanceDetails (Cost-Explorer) has no vCPU/Memory
  fields; populating would require a new ec2:DescribeInstanceTypes
  caller + cache + mocks. Substantial net-new code, tracked as a
  separate follow-up.
- GCP: no compute recommendation converter exists today. N/A.
- Frontend: api.Recommendation has no `details` field and the detail
  drawer renders a hard-coded list. Surfacing the new fields needs
  separate API + UI work; tracked as follow-up once at least one
  provider populates the fields.

Tests: extend pkg/common/types_test.go::TestComputeDetails_GetDetail
Description with 4 new table rows — VCPU-only, MemoryGB-only,
integer-GB, and fractional-GB (Azure 0.5 GB SKU shape).

Refs #82
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

effort/l Weeks impact/few Limited audience priority/p3 Polish / idea / may never ship severity/low Minor harm triaged Item has been triaged type/feat New capability urgency/this-quarter Within the quarter

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant