SPOWL (Safe Planning and Policy Optimization via World Model Learning) is a framework for safe reinforcement learning that unifies world model learning and policy optimization. It leverages latent-space dynamics modeling and constrained optimization to achieve safe and efficient learning in complex continuous control environments.
- Python
- miniconda/conda
Get started with SPOWL:
-
Create a conda environment:
conda create -n spowl python==3.10
-
Activate the environment:
conda activate spowl
-
Install Safety Gymnasium
wget https://github.com/PKU-Alignment/safety-gymnasium/archive/refs/heads/main.zip unzip main.zip rm -rf main.zip pip install -e safety-gymnasium-main
-
Install jax:
pip install --no-cache-dir --upgrade pip pip install --no-cache-dir --upgrade "jax[cuda12]" -
Install other requirements:
pip install --no-cache-dir hydra-core tabulate wandb tqdm moviepy equinox optax
-
Install for 'osmesa':
conda install -c conda-forge mesalib
-
Fix dependencies:
pip install --no-cache-dir gymnasium-robotics==1.2.3 numpy==1.25.0
Run the training script to display all available options and configurations with:
python train.py --helpRun the training script to train default SPOWL configuration:
python train.pyIf you use SPOWL in your research, please cite:
@article{latyshev2025spowl,
title={Safe Planning and Policy Optimization via World Model Learning},
author={Latyshev, Artem and Gorbov, Gregory and Panov, Aleksandr I.},
journal={arXiv preprint arXiv:2506.04828},
year={2025}
}





