MGMT-24194: add OSAC pod log gathering to E2E workflow#78714
Conversation
|
@omer-vishlitzky: This pull request references MGMT-24194 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "5.0.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Repository YAML (base), Central YAML (inherited) Review profile: CHILL Plan: Enterprise Run ID: 📒 Files selected for processing (4)
✅ Files skipped from review due to trivial changes (4)
WalkthroughAdds a new CI gather step "osac-project-gather" (ref, metadata, script, OWNERS), a bash script that SSHes to a remote ci_machine to collect OpenShift logs/describes for E2E and optional namespaces, inserts the step into the baremetal workflow post-steps, and adds ChangesOSAC Gather Step Implementation
sequenceDiagram
participant Runner as CI Runner
participant SSH as SSH Controller
participant Remote as ci_machine (remote shell)
participant OpenShift as OpenShift API
Runner->>SSH: start remote gather script (with timeout)
SSH->>Remote: execute osac-project-gather-commands.sh (heredoc)
Remote->>OpenShift: oc get/describe/events/deployments/jobs/statefulsets
Remote->>OpenShift: oc logs (per-pod per-container, --previous for app containers)
Remote-->>SSH: write artifacts to remote ARTIFACT_DIR
SSH->>Runner: scp artifacts back (timeout, || true)
Runner-->>Runner: step completes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes 🚥 Pre-merge checks | ✅ 12✅ Passed checks (12 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Review rate limit: 7/10 reviews remaining, refill in 13 minutes and 50 seconds. Comment |
|
/pj-rehearse pull-ci-osac-project-osac-installer-main-e2e-metal-vmaas-virtual-network-lifecycle |
|
@omer-vishlitzky: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel. |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In
`@ci-operator/step-registry/osac-project/gather/osac-project-gather-commands.sh`:
- Around line 44-55: The gather currently collects only pods, events and logs
for the optional namespaces (ns variable values "keycloak" and "ansible-aap");
update the loop that iterates over ns to also capture full diagnostics by
running and saving oc describe all resources, oc get deployments, oc get jobs,
oc get statefulsets (and optionally oc get daemonsets/replicasets) into files
under "${ARTIFACT_DIR}/${ns}" (e.g. describe-all.txt, deployments.txt, jobs.txt,
statefulsets.txt) and ensure each oc command redirects stdout/stderr and uses ||
true like the existing oc calls so the script continues even on failures; keep
using the same ns/pod/container variables and naming convention so outputs are
colocated with pods.txt, events.txt and pod-*.log.
- Around line 17-21: The current assignment KUBECONFIG=$(find ${KUBECONFIG}
-type f -print -quit 2>/dev/null) will expand an unset KUBECONFIG and can
trigger nounset; guard the find call by testing the variable first (use
${KUBECONFIG:-} in a test or [[ -n "${KUBECONFIG:-}" ]]) and only run the find
when KUBECONFIG is non-empty, e.g. check [[ -n "${KUBECONFIG:-}" ]] then run the
find into KUBECONFIG, then keep the existing if [[ -z "${KUBECONFIG}" ]]; then
... fi logic. Ensure references to the KUBECONFIG variable and the find
assignment line are updated accordingly.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Central YAML (inherited)
Review profile: CHILL
Plan: Enterprise
Run ID: 0cf9204f-57ad-40c2-9f37-828cc677c4b9
📒 Files selected for processing (5)
ci-operator/step-registry/osac-project/gather/OWNERSci-operator/step-registry/osac-project/gather/osac-project-gather-commands.shci-operator/step-registry/osac-project/gather/osac-project-gather-ref.metadata.jsonci-operator/step-registry/osac-project/gather/osac-project-gather-ref.yamlci-operator/step-registry/osac-project/ofcir/baremetal/osac-project-ofcir-baremetal-workflow.yaml
| KUBECONFIG=$(find ${KUBECONFIG} -type f -print -quit 2>/dev/null) | ||
| if [[ -z "${KUBECONFIG}" ]]; then | ||
| echo "No kubeconfig found, skipping log collection" | ||
| exit 0 | ||
| fi |
There was a problem hiding this comment.
Guard the kubeconfig lookup before nounset bites.
As written, ${KUBECONFIG} is expanded before the empty-path check, so a missing kubeconfig will abort the remote shell instead of cleanly skipping collection.
💡 Suggested fix
- KUBECONFIG=$(find ${KUBECONFIG} -type f -print -quit 2>/dev/null)
- if [[ -z "${KUBECONFIG}" ]]; then
+ if [[ -z "${KUBECONFIG:-}" ]]; then
echo "No kubeconfig found, skipping log collection"
exit 0
fi
+ KUBECONFIG=$(find "${KUBECONFIG}" -type f -print -quit 2>/dev/null)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| KUBECONFIG=$(find ${KUBECONFIG} -type f -print -quit 2>/dev/null) | |
| if [[ -z "${KUBECONFIG}" ]]; then | |
| echo "No kubeconfig found, skipping log collection" | |
| exit 0 | |
| fi | |
| if [[ -z "${KUBECONFIG:-}" ]]; then | |
| echo "No kubeconfig found, skipping log collection" | |
| exit 0 | |
| fi | |
| KUBECONFIG=$(find "${KUBECONFIG}" -type f -print -quit 2>/dev/null) |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In
`@ci-operator/step-registry/osac-project/gather/osac-project-gather-commands.sh`
around lines 17 - 21, The current assignment KUBECONFIG=$(find ${KUBECONFIG}
-type f -print -quit 2>/dev/null) will expand an unset KUBECONFIG and can
trigger nounset; guard the find call by testing the variable first (use
${KUBECONFIG:-} in a test or [[ -n "${KUBECONFIG:-}" ]]) and only run the find
when KUBECONFIG is non-empty, e.g. check [[ -n "${KUBECONFIG:-}" ]] then run the
find into KUBECONFIG, then keep the existing if [[ -z "${KUBECONFIG}" ]]; then
... fi logic. Ensure references to the KUBECONFIG variable and the find
assignment line are updated accordingly.
86885d7 to
63b0055
Compare
|
/pj-rehearse pull-ci-osac-project-osac-installer-main-e2e-metal-vmaas-virtual-network-lifecycle |
|
@omer-vishlitzky: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel. |
Add osac-project-gather step that collects pod logs, events, and resource descriptions from OSAC namespaces (osac-e2e-ci, keycloak, ansible-aap) after E2E tests complete. Logs are written to ARTIFACT_DIR for Prow artifact archival. Runs as best_effort so gather failures do not mask test failures.
63b0055 to
9901326
Compare
|
/pj-rehearse pull-ci-osac-project-osac-installer-main-e2e-metal-vmaas-virtual-network-lifecycle |
|
@omer-vishlitzky: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel. |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
ci-operator/step-registry/osac-project/gather/OWNERS (1)
1-7: ⚡ Quick winSync step metadata owners with this OWNERS file.
Lines 1-7 add
assisted-cicd, butci-operator/step-registry/osac-project/gather/osac-project-gather-ref.metadata.json(lines 1-15 in the provided context) still lists onlyosac-cicd. Please update the metadata owners too to prevent ownership drift across review/automation surfaces.Proposed metadata sync
--- a/ci-operator/step-registry/osac-project/gather/osac-project-gather-ref.metadata.json +++ b/ci-operator/step-registry/osac-project/gather/osac-project-gather-ref.metadata.json @@ "owners": { "approvers": [ - "osac-cicd" + "osac-cicd", + "assisted-cicd" ], "reviewers": [ - "osac-cicd" + "osac-cicd", + "assisted-cicd" ] } }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@ci-operator/step-registry/osac-project/gather/OWNERS` around lines 1 - 7, Update the step metadata to match the OWNERS file by adding "assisted-cicd" to the approvers and reviewers arrays in the metadata file named osac-project-gather-ref.metadata.json; locate the approvers and reviewers keys in that JSON and insert "assisted-cicd" alongside "osac-cicd" so the metadata and OWNERS remain in sync.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@ci-operator/step-registry/osac-project/gather/OWNERS`:
- Around line 1-7: Update the step metadata to match the OWNERS file by adding
"assisted-cicd" to the approvers and reviewers arrays in the metadata file named
osac-project-gather-ref.metadata.json; locate the approvers and reviewers keys
in that JSON and insert "assisted-cicd" alongside "osac-cicd" so the metadata
and OWNERS remain in sync.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Central YAML (inherited)
Review profile: CHILL
Plan: Enterprise
Run ID: eaea55dc-92e0-4033-880a-c9225160fbb4
📒 Files selected for processing (2)
ci-operator/step-registry/osac-project/gather/OWNERSci-operator/step-registry/osac-project/ofcir/baremetal/OWNERS
✅ Files skipped from review due to trivial changes (1)
- ci-operator/step-registry/osac-project/ofcir/baremetal/OWNERS
92e799a to
ae0d357
Compare
|
/pj-rehearse pull-ci-osac-project-osac-installer-main-e2e-metal-vmaas-virtual-network-lifecycle |
|
@omer-vishlitzky: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel. |
ae0d357 to
9901326
Compare
|
[REHEARSALNOTIFIER]
Interacting with pj-rehearseComment: Once you are satisfied with the results of the rehearsals, comment: |
|
/approve |
|
/pj-rehearse pull-ci-osac-project-osac-installer-main-e2e-metal-vmaas-virtual-network-lifecycle |
|
@omer-vishlitzky: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel. |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: danmanor, eranco74, jhernand, omer-vishlitzky The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
/pj-rehearse ack |
|
@omer-vishlitzky: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel. |
|
@omer-vishlitzky: all tests passed! Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
Summary
https://redhat.atlassian.net/browse/MGMT-24194
Add pod log gathering to the OSAC E2E test workflow. Currently, when tests fail, there are zero application logs in the CI artifacts — only OFCIR infrastructure metadata. This makes debugging failures extremely difficult.
Changes
New step:
osac-project-gatherosac-e2e-ci), pluskeycloakandansible-aapnamespacesbest_effortwith 15m timeout so gather failures don't mask test failures$ARTIFACT_DIR/osac-logs/for Prow artifact archivalModified:
osac-project-ofcir-baremetalworkflowosac-project-gatheras the first post step (beforeofcir-gatherandofcir-release)Test plan
osac-logs/Summary by CodeRabbit
New Features
Chores
Integration