Skip to content

fix(cse): resolve error code collisions and add missing definition#8241

Open
surajssd wants to merge 2 commits intomainfrom
suraj/fix-cse-error-code-collisions
Open

fix(cse): resolve error code collisions and add missing definition#8241
surajssd wants to merge 2 commits intomainfrom
suraj/fix-cse-error-code-collisions

Conversation

@surajssd
Copy link
Copy Markdown
Member

@surajssd surajssd commented Apr 3, 2026

What this PR does / why we need it:

Fixes CSE error code collisions and a missing error code definition in parts/linux/cloud-init/artifacts/cse_helpers.sh.

Two pairs of error codes shared the same numeric value, which makes it impossible to distinguish between different failure modes during node provisioning. Additionally, ERR_NVIDIA_DCGM_INSTALL was used in cse_config.sh but never defined, meaning it would resolve to an empty string at runtime.

Changes

  • ERR_PULL_POD_INFRA_CONTAINER_IMAGE: 225 -> 233 (previously collided with ERR_NVIDIA_GPG_KEY_DOWNLOAD_TIMEOUT=225)
  • ERR_IMDS_FETCH_FAILED: 231 -> 234 (previously collided with ERR_ORAS_PULL_SYSEXT_FAIL=231)
  • ERR_NVIDIA_DCGM_INSTALL=235: added missing definition (used by installNvidiaManagedExpPkgFromCache in cse_config.sh)

Which issue(s) this PR fixes:

Fixes #7466
Fixes #7467

- Change `ERR_PULL_POD_INFRA_CONTAINER_IMAGE` from `225` to `233` to
  avoid collision with `ERR_NVIDIA_GPG_KEY_DOWNLOAD_TIMEOUT`
- Change `ERR_IMDS_FETCH_FAILED` from `231` to `234` to avoid
  collision with `ERR_ORAS_PULL_SYSEXT_FAIL`
- Add missing `ERR_NVIDIA_DCGM_INSTALL=235` definition used by
  `cse_config.sh`

Fixes #7466, fixes #7467

Signed-off-by: Suraj Deshmukh <suraj.deshmukh@microsoft.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates Linux CSE helper error code definitions to eliminate numeric collisions and adds a missing error code constant so provisioning failures can be distinguished reliably.

Changes:

  • Reassign ERR_PULL_POD_INFRA_CONTAINER_IMAGE from 225 to 233 to avoid colliding with ERR_NVIDIA_GPG_KEY_DOWNLOAD_TIMEOUT=225.
  • Reassign ERR_IMDS_FETCH_FAILED from 231 to 234 to avoid colliding with ERR_ORAS_PULL_SYSEXT_FAIL=231.
  • Add missing ERR_NVIDIA_DCGM_INSTALL=235 definition (used by managed GPU experience install path).

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings April 3, 2026 23:45
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated no new comments.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

CSE: Error used without definition CSE Error Codes Collision

2 participants