A single-host diagnostic daemon that records NVIDIA GPU utilization to SQLite and produces a retrospective report separating active use from allocated-but-idle ("idle-held") and truly idle (no process at all).
Conventional dashboards collapse the latter two. Surfacing
idle-held as its own number is the entire point. Someone left a
Jupyter notebook open with an 8 GB tensor on the GPU and went to
lunch — nvidia-smi will show 1% utilization, but the card is
unusable by anyone else. This tool measures that.
Status: v0.4.1 —
guaexposes the new auto-runtime command surface as safe placeholders.daemonstill runs on a real NVIDIA host,demoruns anywhere (no GPU required), andreportreads either. The Go v0.1.0 implementation remains downloadable at tagv0.1.0/ branchgo-archive.
The recommended install path is PyPI via uv. The package has no core runtime dependencies.
Requires uv. In normal online environments, uv creates the isolated tool environment and manages the needed Python runtime. If Python downloads are disabled by local policy, install Python 3.12+ first.
uv tool install gpu-usage-audit
gua doctor
gua start --dry-run
gpu-usage-audit demoIn v0.4.1 the gua commands are intentionally read-only placeholders:
they print what is not implemented yet and make no system, service,
cluster, or database changes. Use gpu-usage-audit daemon/report/demo
for the existing compatibility workflow.
Available gua subcommands in v0.4.1: doctor, start, status,
report, stop, and uninstall.
Update or remove the installed tool with uv:
uv tool upgrade gpu-usage-audit
uv tool uninstall gpu-usage-audituv tool uninstall gpu-usage-audit removes the installed Python tool and
its gua / gpu-usage-audit commands. gua uninstall is different: it
is reserved for future runtime cleanup and is a no-op placeholder in
v0.4.1.
GitHub Release assets are also available for manual download:
BASE="https://github.com/AI-Ocean/gpu-usage-audit/releases/download/v0.4.1"
WHEEL="gpu_usage_audit-0.4.1-py3-none-any.whl"
curl -fsSLO "$BASE/$WHEEL"
curl -fsSLO "$BASE/SHA256SUMS"
sha256sum -c SHA256SUMS --ignore-missing
uvx --from "./$WHEEL" gua doctor$ gpu-usage-audit report --db /var/lib/gua/gua.db --since 1h
gpu-usage-audit — lab-a100 (bare, driver 560.35.05) Window: 1:00:00
§1 Headline
█████████▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒░░░░░░░░░░░░░░░░░░░░░░░░
active █ 15.7%
idle-held ▒ 45.1% ← this is the number conventional tools miss
truly-idle ░ 39.2%
(51 samples)
§2 Waste
~0.43 GPU-hours idle, ~2.53 GPUs equivalently unused
§3 Per-GPU
GPU-0 active 47.1% idle-held 35.3% truly-idle 17.6%
GPU-1 active 0.0% idle-held 100.0% truly-idle 0.0%
GPU-2 active 0.0% idle-held 0.0% truly-idle 100.0%
§4 Top identities
identity gpu-hours idle-held
alice 0.42 42.9%
bob 0.28 100.0%
§5 Time-of-day heatmap (UTC)
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3
Mon .
The 3-bar collapses every card × every tick over the window into the
active / idle-held / truly-idle split. idle-held rows are the
embarrassing category: a process is holding GPU memory but the SM
utilization is below 10%.
The demo subcommand records 30 ticks of fake telemetry and prints the
report — all in one process, no second shell needed.
gpu-usage-audit demoThe bundled FakeTier produces a deterministic 5-tick workload —
active learning → idle-held memory → cleanup — so the output is the
same every run. Adjust the shape with --ticks N and --interval D.
On an NVIDIA host, install the [nvml] extra and run daemon:
# Add the NVML Python package to the tool environment.
uv tool install --force --with nvidia-ml-py gpu-usage-audit
gpu-usage-audit daemon --db /tmp/gua.db --interval 30sRun the report from another shell:
gpu-usage-audit report --db /tmp/gua.db --since 1h --interval 30sThe daemon requires the NVIDIA driver and
libnvidia-ml.so.1. On a driver-less host it exits withNVML Shared Library Not Found. For a driverless box, usedemoinstead.
gpu-usage-audit has three commands sharing one SQLite file:
| Command | What it does |
|---|---|
daemon |
Long-running background process. Samples real NVML telemetry on every tick and appends to the database. Stop with Ctrl+C (SIGINT) or systemctl stop. NVIDIA host required. |
report |
One-shot read against the accumulated database. Safe to run while the daemon is still writing — SQLite WAL mode handles the concurrency. |
demo |
Self-contained showcase. Records N fake ticks and immediately prints the report. No GPU, no second shell, no operational meaning — just to see the output shape. |
gpu-usage-audit daemon --db PATH [--interval D]
--db PATH— SQLite file to write to. Created if missing. WAL mode enabled automatically.--interval D(default30s) — how often to sample. Accepts30s,1m,200ms, etc.
Each tick prints a one-line summary to stdout; on shutdown the cumulative row count is printed.
gpu-usage-audit report --db PATH [--since D] [--interval D] [--width N]
--db PATH— same SQLite file the daemon writes to.--since D(default1h) — the report window. No upper bound —--since 365dis accepted. The effective window is min(--since, age of oldest sample), so passing a huge--sinceis the same as "all data". Units:ms,s,m,h,d(now; use7d).--interval D(default30s) — must match what the daemon used. This is how §2 (Waste) and §4 (Top identities) convert tick counts to GPU-hours. Mismatched intervals → wrong GPU-hours.--width N(default60) — width of the §1 three-bar in characters.
gpu-usage-audit demo [--db PATH] [--ticks N] [--interval D]
--db PATH(optional) — if omitted, a fresh temporary database is created and its path is printed to stderr.--ticks N(default30) — how many fake ticks to record before printing the report.--interval D(default1s) — tick spacing.
- Same
--intervalon both sides. If you ran the daemon with--interval 30s, runreport --interval 30stoo. - Let it run for a while. §1/§3 are meaningful after one tick; §4 (Top identities) needs hours; §5 (Heatmap) needs days.
- WAL leaves sidecar files (
gua.db-wal,gua.db-shm). They are cleaned up automatically when the last connection closes. - DB size: ~50 MB per host per 30 days at 12 GPUs (extrapolated from Go v0.1.0; not yet re-measured for the Python rewrite).
For a long-running deployment, drop a unit file in
/etc/systemd/system/gpu-usage-audit.service:
[Unit]
Description=gpu-usage-audit daemon
After=network.target
[Service]
Type=simple
ExecStart=/usr/local/bin/gpu-usage-audit daemon --db /var/lib/gua/gua.db --interval 30s
Restart=on-failure
User=gua
[Install]
WantedBy=multi-user.targetThen systemctl enable --now gpu-usage-audit.
Each tick of the daemon records:
- per-card:
util_pct(SM utilization) - per-process:
mem_used_mbper(card, pid)
The report aggregates per card × per tick:
util >= 10 → active (compute is happening)
util < 10 AND mem > 100 → idle-held (memory is held, SM is cold)
util < 10 AND mem <= 100 → truly-idle (the card is genuinely free)
The 100 MB threshold absorbs the PyTorch/TF runtime baseline so importing torch doesn't count as "holding the GPU".
Requires uv (uv pins the Python version
automatically; requires-python = ">=3.12").
git clone https://github.com/AI-Ocean/gpu-usage-audit
cd gpu-usage-audit
uv sync # create .venv, install dev deps
uv run pytest # run the test suite
uv run ruff check # lint
uv run mypy # type-check (strict)
uv run gpu-usage-audit demo # see the report shape locallyCI runs ruff + format check + mypy + pytest, then builds and smoke-tests
the wheel on every push and PR. Tag pushes (v*) rerun the same checks,
build sdist + wheel, smoke-test the wheel, and create a GitHub Release
with auto-generated notes. Release tags also publish the wheel and sdist
to PyPI through Trusted Publishing.
This is a single-host retrospective tool. Live dashboards, multi-host aggregation, quotas, and pod-name resolution are out of scope — those belong above the host layer. If this tool surfaces enough idle-held to make scheduling worth solving, see ocean-all.
Apache License 2.0 — see LICENSE.