Skip to content

fix(training): resolve Azure ML RL submission path and runtime dependency failures #320

@fbeltrao

Description

@fbeltrao

Training Framework

SKRL

Issue Type

Training script error

Python Version

3.11

GPU Model

A10

Isaac Sim/Lab Version

Isaac Lab 2.3.2 / Isaac Sim container nvcr.io/nvidia/isaac-lab:2.3.2

Issue Description

Submitting the RL Azure ML job from training/rl currently fails when using the repository implementation in training/rl/scripts/submit-azureml-training.sh.

The first observed failure is an entrypoint path mismatch. Azure ML accepts the job submission, but the run fails immediately because the submitted command points to training/scripts/train.sh, while the repository entrypoint exists at training/rl/scripts/train.sh.

The next constraint discovered during validation is that a narrow upload rooted only at training/rl is not sufficient by itself. The runtime launched by training/rl/scripts/train.sh imports from the top-level training package, including training.rl, training.utils, and training.stream. That means a training/rl-only snapshot would fix the shell path problem but then fail on Python imports unless the uploaded payload also preserves the parent training/ package layout.

Additionally, the entire repo is part of the AML job, as the .amlignore is not in expected place

Image

Training Configuration

Just running from `/training/rl` `./scripts/submit-azureml-training.sh --task Isaac-Velocity-Rough-Anymal-C-v0`

Error Traceback

{"NonCompliant":"Execution failed. User process 'bash' exited with status code 127. Please check log file 'user_logs/std_log.txt' for error details. Error: bash: training/scripts/train.sh: No such file or directory\n"}
{"code": "ExecutionFailed", "target": "", "category": "UserError", "error_details": [{"key": "exit_codes", "value": "127"}]}

Checklist

  • I have verified my environment is synced with uv sync
  • I have tested with a minimal configuration

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions