Skip to content

MRB-860 multi panel lead time plots#152

Draft
Louis-Frey wants to merge 9 commits into
mainfrom
MRB-860-Multi-Panel-Lead-Time-Plots
Draft

MRB-860 multi panel lead time plots#152
Louis-Frey wants to merge 9 commits into
mainfrom
MRB-860-Multi-Panel-Lead-Time-Plots

Conversation

@Louis-Frey
Copy link
Copy Markdown
Contributor

@Louis-Frey Louis-Frey commented May 13, 2026

Summary

Adds a configurable multi-panel metric-vs-lead-time plot, driven by a new multipanel_plots: section in the experiment YAML, plus quality-of-life improvements that also benefit the existing single-panel plots and the dashboard.

What you get

  • New multipanel_plots: config section. Each named entry declares rows, cols, optional figsize/title, and a row-major list of panels. Each panel is {metric, param} plus optional region, season, init_hour, title, ylim. Validated by MultipanelPlotSpec (Pydantic): rows*cols must equal len(panels), unknown fields are forbidden.
  • New workflow rule verification_metrics_multipanel_plot producing output/results/<experiment>/multipanel/<name>.png. Aggregator rule verification_metrics_multipanel_plot_all expands over every named layout. Wired into experiment_all, so plain evalml experiment builds the layouts when the section is present and is a no-op otherwise.
  • Units on y-axis labels. Auto-derived from (metric, param) for matplotlib plots (single-panel and multipanel). Backed by a small PARAM_UNITS map in src/plotting/units.py. Known limitation: the verification netCDFs don't carry units attrs, so this is a hard-coded dict — see follow-ups below.
  • Panel labels ( a), b), c), ...) rendered as bold, left-aligned titles at the same height as the centred title.
  • Stable, bijective source → color mapping shared with the dashboard. Both surfaces use Vega-Lite's tableau10 over the alphabetically-sorted full source list. Pinning Vega's color.scale.domain to that list also fixes a longstanding dashboard glitch where toggling sources reshuffled the remaining colors.
  • Refactor: shared verification-plotting building blocks. src/verification/loading.py (load_long_df, subset_df, _ensure_unique_lead_time, _select_best_sources) and src/plotting/metric_lead_time_panel.py (plot_panel). The existing single-panel script and the dashboard script were updated to consume these; no behaviour change.
  • Tests. 5 new tests in tests/unit/test_config.py covering panel-level defaults, rows*cols validation, extra: "forbid", and a ConfigModel round-trip with multipanel_plots populated.

Example

config/multipanel_example.yaml is a working config with two layouts: bias_overview (BIAS by season — all vs JJA — for T_2M and PMSL) and rmse_overview (RMSE by init hour — 0, 12 — for T_2M and PMSL). Run with evalml experiment config/multipanel_example.yaml, which produces single-panel plots, dashboard, and multipanel PNGs all in one go.

rmse_overview

Test plan

  • pytest tests/unit — 44 passing (5 new).
  • Pre-commit pydantic-schema hook clean (schema regenerated).
  • Dry-run with --rerun-triggers mtime against a fresh multipanel_plots config: only multipanel/plot/dashboard jobs planned, no spurious inference reruns.
  • Visual check of generated PNGs and dashboard.html; colors agree across the three surfaces; dashboard source toggle preserves the remaining sources' colors.

Caveats / follow-ups

  • PARAM_UNITS in src/plotting/units.py is a hand-maintained dict because the metric computation in src/verification/__init__.py:verify drops the units attrs that the input data carries. A cleaner fix is to propagate units through the verify → aggregate chain so the netCDF carries them; left as a follow-up.
  • Color palette caps at 10 sources (Vega's tableau10); beyond that we wrap and lose bijectivity. Easy to bump to tableau20 later, or to a deterministic HSV ramp, if we ever have >10 sources.
  • Suptitle and legend placement on the multipanel figure are minimally tuned (left-aligned panel labels can crowd the suptitle on wide layouts). Worth revisiting if we settle on a publication-ready layout.
  • I tested locally by symlinking pre-MRB-820 aggregates into the new run-id paths to avoid re-running inference for a large pre-existing experiment; that's a local hack — output/ is gitignored, so a fresh clone needs a full evalml experiment run. Fine to do a small experiment for development, can still see about a full experiment on truly independent test data (2026) once we polish for the paper.

…modules

Move the verif-netCDF loading helpers (_ensure_unique_lead_time,
_select_best_sources, the long-form DataFrame builder, subset_df) into
src/verification/loading.py, and the per-axes metric-vs-lead-time
plotter into src/plotting/metric_lead_time_panel.py. Update
verification_plot_metrics.py and report_experiment_dashboard.py to
import from the new locations; the sys.path hack in the dashboard
script is no longer needed. No behavior change.
New rule verification_metrics_multipanel_plot builds one PNG per named
layout under results/<experiment>/multipanel/<name>.png, driven by a
JSON spec passed inline. Aggregator rule
verification_metrics_multipanel_plot_all expands over every entry of
the new optional config field multipanel_plots (no-op when absent, so
existing configs are unaffected). Each layout specifies rows, cols,
optional figsize and figure title, and a row-major list of panels
(metric, param, optional region/season/init_hour/title/ylim). The
script reuses load_long_df and plot_panel, draws all panels with
sharex=True and independent y-axes, and emits a single deduped legend
at the bottom of the figure.
Expand experiment_all's inputs over the multipanel_plots config entries
so `evalml experiment` produces the layouts declared in the YAML. No-op
when the section is absent.
* New src/plotting/units.py: PARAM_UNITS mapping + metric_units(metric,
  param) helper that yields '' for CORR/R2 and the param's canonical
  units otherwise.
* plot_panel now accepts `param` and auto-builds the y-axis label as
  "<decoded metric> [<units>]" when no explicit ylabel is given. It
  also accepts `panel_label` (e.g. "a)") rendered as a bold,
  left-aligned title at the same height as the centred title.
* verification_plot_metrics.py passes `param` so single-panel plots
  pick up units automatically.
* verification_plot_metrics_multipanel.py: numbers panels a), b), ...,
  in row-major order; replaces tight_layout with explicit
  subplots_adjust margins so inter-panel hspace/wspace are honoured
  and the bottom legend has guaranteed room.
Pick up MultipanelPanelSpec, MultipanelPlotSpec, and the new
multipanel_plots field on ConfigModel. Generated via
`python src/evalml/config.py workflow/tools/config.schema.json`.
Source -> color is now bijective and stable across the dashboard, the
single-panel plots, and the multipanel plots.

* src/plotting/source_colors.py: TABLEAU10 palette and a
  source_color_map() helper that assigns colors over the
  alphabetically-sorted full source list. Wraps past 10 sources;
  switch palettes if that becomes an issue in practice.
* plot_panel grows an optional color_map arg. Both single-panel and
  multipanel scripts build one map from all_df["source"].unique() and
  pass it down. The matplotlib-only "analysis = black" override is
  dropped to match the dashboard.
* resources/report/dashboard/script.js pins the color scale's domain
  to the full source list and uses Vega-Lite's "tableau10" scheme, so
  toggling sources in the UI no longer reshuffles colors.
Cover panel-level defaults (region/season/init_hour, title, ylim),
the row*cols == len(panels) validator, the extra="forbid" guard on
both models, and that ConfigModel.multipanel_plots defaults to an
empty dict and round-trips a named layout.
Working config exercising the new multipanel_plots feature: stage_E_realch1
vs stage_E_icon_1km_cutoff_edges_subgrid_horography against ICON-CH1/CH2
baselines, with a BIAS-by-season and an RMSE-by-init-hour 2x2 layout.
Serves as a copy-paste starting point for new layouts.
@Louis-Frey Louis-Frey requested a review from radiradev May 13, 2026 11:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant