Skip to content

[codex] add TorchRL Flame DQN example#453

Merged
k82cn merged 1 commit into
xflops:mainfrom
k82cn:codex/torchrl-flame-dqn-example
May 14, 2026
Merged

[codex] add TorchRL Flame DQN example#453
k82cn merged 1 commit into
xflops:mainfrom
k82cn:codex/torchrl-flame-dqn-example

Conversation

@k82cn
Copy link
Copy Markdown
Contributor

@k82cn k82cn commented May 13, 2026

Summary

  • Add a TorchRL DQN example under examples/rl/torchrl_dqn based on the upstream TorchRL tutorial loop.
  • Wire distributed collection through Flame Runner services and replay storage through patch_object.
  • Add configurable replay modes, including sharded replay with parallel sampling controls.
  • Document local and distributed usage, heavier discrete environments, and validation notes.

Validation

  • python3 -m py_compile examples/rl/torchrl_dqn/main.py examples/rl/torchrl_dqn/model.py examples/rl/torchrl_dqn/collector.py examples/rl/torchrl_dqn/replay_buffer.py
  • python3 examples/rl/torchrl_dqn/main.py --help
  • sdk/python/.venv/bin/ruff check examples/rl/torchrl_dqn
  • sdk/python/.venv/bin/ruff format --check examples/rl/torchrl_dqn
  • uv run main.py --local --env acrobot --iterations 1 --collections 1 --frames-per-collection 2 --batch-size 1 --warmup-frames 1 --replay simple --metrics-json /tmp/torchrl-main-simple-smoke.json
  • uv run main.py --local --env acrobot --iterations 1 --collections 1 --frames-per-collection 2 --batch-size 1 --warmup-frames 1 --replay sharded --replay-shards 2 --sample-work 8 --sample-parallelism 2 --metrics-json /tmp/torchrl-main-sharded-smoke.json
  • git diff --check

Distributed Flame runtime smoke was not run because it requires an active Flame cluster.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new reinforcement learning example, examples/rl/torchrl_dqn, which adapts the TorchRL CartPole DQN tutorial to the Flame Runner framework. The implementation includes a distributed rollout collector, a sharded replay buffer using Flame's ObjectRef and patch_object capabilities, and a local training mode for validation. Feedback on the code changes highlights opportunities to improve performance by moving imports out of the hot collection loop and suggests removing an unused helper function in the main entry point.

Comment thread examples/rl/torchrl_dqn/collector.py Outdated
Comment on lines +51 to +55
import random

import torch
from model import flatten_observation
from tensordict import TensorDict
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

These imports are performed inside the _select_action method, which is called for every environment step (e.g., 100 times per collection). Although Python caches imports in sys.modules, repeated lookups in a performance-sensitive reinforcement learning loop add unnecessary overhead. Moving these imports to the top of the file is more efficient and adheres to PEP 8 guidelines.

References
  1. Imports are always put at the top of the file, just after any module comments and docstrings, and before module globals and constants. (link)

Comment thread examples/rl/torchrl_dqn/main.py Outdated
Comment on lines +49 to +55
def _sample_request_sizes(batch_size: int, sample_parallelism: int) -> list[int]:
if sample_parallelism < 1:
raise ValueError("sample_parallelism must be at least 1")
if batch_size < 1:
raise ValueError("batch_size must be at least 1")

return split_batch(batch_size, sample_parallelism)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The _sample_request_sizes function is defined but never used in the script. _sample_shard_plan (line 58) implements its own logic using split_batch directly. Removing this dead code improves the maintainability of the example.

@codecov
Copy link
Copy Markdown

codecov Bot commented May 13, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

@k82cn k82cn force-pushed the codex/torchrl-flame-dqn-example branch 2 times, most recently from 46529e8 to 7f8e3f1 Compare May 13, 2026 23:55
@k82cn k82cn force-pushed the codex/torchrl-flame-dqn-example branch from 7f8e3f1 to fb06d43 Compare May 14, 2026 00:02
@k82cn k82cn merged commit 62d3783 into xflops:main May 14, 2026
6 checks passed
@k82cn k82cn deleted the codex/torchrl-flame-dqn-example branch May 14, 2026 01:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant