MRB-860 multi panel lead time plots by Louis-Frey · Pull Request #152 · MeteoSwiss/evalml

Louis-Frey · 2026-05-13T11:21:48Z

Summary

Adds a configurable multi-panel metric-vs-lead-time plot, driven by a new multipanel_plots: section in the experiment YAML, plus quality-of-life improvements that also benefit the existing single-panel plots and the dashboard.

What you get

New multipanel_plots: config section. Each named entry declares rows, cols, optional figsize/title, and a row-major list of panels. Each panel is {metric, param} plus optional region, season, init_hour, title, ylim. Validated by MultipanelPlotSpec (Pydantic): rows*cols must equal len(panels), unknown fields are forbidden.
New workflow rule verification_metrics_multipanel_plot producing output/results/<experiment>/multipanel/<name>.png. Aggregator rule verification_metrics_multipanel_plot_all expands over every named layout. Wired into experiment_all, so plain evalml experiment builds the layouts when the section is present and is a no-op otherwise.
Units on y-axis labels. Auto-derived from (metric, param) for matplotlib plots (single-panel and multipanel). Backed by a small PARAM_UNITS map in src/plotting/units.py. Known limitation: the verification netCDFs don't carry units attrs, so this is a hard-coded dict — see follow-ups below.
Panel labels ( a), b), c), ...) rendered as bold, left-aligned titles at the same height as the centred title.
Stable, bijective source → color mapping shared with the dashboard. Both surfaces use Vega-Lite's tableau10 over the alphabetically-sorted full source list. Pinning Vega's color.scale.domain to that list also fixes a longstanding dashboard glitch where toggling sources reshuffled the remaining colors.
Refactor: shared verification-plotting building blocks. src/verification/loading.py (load_long_df, subset_df, _ensure_unique_lead_time, _select_best_sources) and src/plotting/metric_lead_time_panel.py (plot_panel). The existing single-panel script and the dashboard script were updated to consume these; no behaviour change.
Tests. 5 new tests in tests/unit/test_config.py covering panel-level defaults, rows*cols validation, extra: "forbid", and a ConfigModel round-trip with multipanel_plots populated.

Example

config/multipanel_example.yaml is a working config with two layouts: bias_overview (BIAS by season — all vs JJA — for T_2M and PMSL) and rmse_overview (RMSE by init hour — 0, 12 — for T_2M and PMSL). Run with evalml experiment config/multipanel_example.yaml, which produces single-panel plots, dashboard, and multipanel PNGs all in one go.

Test plan

pytest tests/unit — 44 passing (5 new).
Pre-commit pydantic-schema hook clean (schema regenerated).
Dry-run with --rerun-triggers mtime against a fresh multipanel_plots config: only multipanel/plot/dashboard jobs planned, no spurious inference reruns.
Visual check of generated PNGs and dashboard.html; colors agree across the three surfaces; dashboard source toggle preserves the remaining sources' colors.

Caveats / follow-ups

PARAM_UNITS in src/plotting/units.py is a hand-maintained dict because the metric computation in src/verification/__init__.py:verify drops the units attrs that the input data carries. A cleaner fix is to propagate units through the verify → aggregate chain so the netCDF carries them; left as a follow-up.
Color palette caps at 10 sources (Vega's tableau10); beyond that we wrap and lose bijectivity. Easy to bump to tableau20 later, or to a deterministic HSV ramp, if we ever have >10 sources.
Suptitle and legend placement on the multipanel figure are minimally tuned (left-aligned panel labels can crowd the suptitle on wide layouts). Worth revisiting if we settle on a publication-ready layout.
I tested locally by symlinking pre-MRB-820 aggregates into the new run-id paths to avoid re-running inference for a large pre-existing experiment; that's a local hack — output/ is gitignored, so a fresh clone needs a full evalml experiment run. Fine to do a small experiment for development, can still see about a full experiment on truly independent test data (2026) once we polish for the paper.

…modules Move the verif-netCDF loading helpers (_ensure_unique_lead_time, _select_best_sources, the long-form DataFrame builder, subset_df) into src/verification/loading.py, and the per-axes metric-vs-lead-time plotter into src/plotting/metric_lead_time_panel.py. Update verification_plot_metrics.py and report_experiment_dashboard.py to import from the new locations; the sys.path hack in the dashboard script is no longer needed. No behavior change.

New rule verification_metrics_multipanel_plot builds one PNG per named layout under results/<experiment>/multipanel/<name>.png, driven by a JSON spec passed inline. Aggregator rule verification_metrics_multipanel_plot_all expands over every entry of the new optional config field multipanel_plots (no-op when absent, so existing configs are unaffected). Each layout specifies rows, cols, optional figsize and figure title, and a row-major list of panels (metric, param, optional region/season/init_hour/title/ylim). The script reuses load_long_df and plot_panel, draws all panels with sharex=True and independent y-axes, and emits a single deduped legend at the bottom of the figure.

Expand experiment_all's inputs over the multipanel_plots config entries so `evalml experiment` produces the layouts declared in the YAML. No-op when the section is absent.

* New src/plotting/units.py: PARAM_UNITS mapping + metric_units(metric, param) helper that yields '' for CORR/R2 and the param's canonical units otherwise. * plot_panel now accepts `param` and auto-builds the y-axis label as "<decoded metric> [<units>]" when no explicit ylabel is given. It also accepts `panel_label` (e.g. "a)") rendered as a bold, left-aligned title at the same height as the centred title. * verification_plot_metrics.py passes `param` so single-panel plots pick up units automatically. * verification_plot_metrics_multipanel.py: numbers panels a), b), ..., in row-major order; replaces tight_layout with explicit subplots_adjust margins so inter-panel hspace/wspace are honoured and the bottom legend has guaranteed room.

Pick up MultipanelPanelSpec, MultipanelPlotSpec, and the new multipanel_plots field on ConfigModel. Generated via `python src/evalml/config.py workflow/tools/config.schema.json`.

Source -> color is now bijective and stable across the dashboard, the single-panel plots, and the multipanel plots. * src/plotting/source_colors.py: TABLEAU10 palette and a source_color_map() helper that assigns colors over the alphabetically-sorted full source list. Wraps past 10 sources; switch palettes if that becomes an issue in practice. * plot_panel grows an optional color_map arg. Both single-panel and multipanel scripts build one map from all_df["source"].unique() and pass it down. The matplotlib-only "analysis = black" override is dropped to match the dashboard. * resources/report/dashboard/script.js pins the color scale's domain to the full source list and uses Vega-Lite's "tableau10" scheme, so toggling sources in the UI no longer reshuffles colors.

Cover panel-level defaults (region/season/init_hour, title, ylim), the row*cols == len(panels) validator, the extra="forbid" guard on both models, and that ConfigModel.multipanel_plots defaults to an empty dict and round-trips a named layout.

Working config exercising the new multipanel_plots feature: stage_E_realch1 vs stage_E_icon_1km_cutoff_edges_subgrid_horography against ICON-CH1/CH2 baselines, with a BIAS-by-season and an RMSE-by-init-hour 2x2 layout. Serves as a copy-paste starting point for new layouts.

Louis-Frey added 8 commits May 11, 2026 18:10

Build multipanel plots as part of experiment_all

72cf176

Expand experiment_all's inputs over the multipanel_plots config entries so `evalml experiment` produces the layouts declared in the YAML. No-op when the section is absent.

Regenerate config JSON schema for multipanel_plots

e3d019e

Pick up MultipanelPanelSpec, MultipanelPlotSpec, and the new multipanel_plots field on ConfigModel. Generated via `python src/evalml/config.py workflow/tools/config.schema.json`.

Louis-Frey requested a review from radiradev May 13, 2026 11:21

Apply pre-commit fixes (trailing whitespace, ruff format)

eb9f3d7

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MRB-860 multi panel lead time plots#152

MRB-860 multi panel lead time plots#152
Louis-Frey wants to merge 9 commits into
mainfrom
MRB-860-Multi-Panel-Lead-Time-Plots

Louis-Frey commented May 13, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Louis-Frey commented May 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What you get

Example

Test plan

Caveats / follow-ups

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Louis-Frey commented May 13, 2026 •

edited

Loading