MRB-860 multi panel lead time plots#152
Draft
Louis-Frey wants to merge 9 commits into
Draft
Conversation
…modules Move the verif-netCDF loading helpers (_ensure_unique_lead_time, _select_best_sources, the long-form DataFrame builder, subset_df) into src/verification/loading.py, and the per-axes metric-vs-lead-time plotter into src/plotting/metric_lead_time_panel.py. Update verification_plot_metrics.py and report_experiment_dashboard.py to import from the new locations; the sys.path hack in the dashboard script is no longer needed. No behavior change.
New rule verification_metrics_multipanel_plot builds one PNG per named layout under results/<experiment>/multipanel/<name>.png, driven by a JSON spec passed inline. Aggregator rule verification_metrics_multipanel_plot_all expands over every entry of the new optional config field multipanel_plots (no-op when absent, so existing configs are unaffected). Each layout specifies rows, cols, optional figsize and figure title, and a row-major list of panels (metric, param, optional region/season/init_hour/title/ylim). The script reuses load_long_df and plot_panel, draws all panels with sharex=True and independent y-axes, and emits a single deduped legend at the bottom of the figure.
Expand experiment_all's inputs over the multipanel_plots config entries so `evalml experiment` produces the layouts declared in the YAML. No-op when the section is absent.
* New src/plotting/units.py: PARAM_UNITS mapping + metric_units(metric, param) helper that yields '' for CORR/R2 and the param's canonical units otherwise. * plot_panel now accepts `param` and auto-builds the y-axis label as "<decoded metric> [<units>]" when no explicit ylabel is given. It also accepts `panel_label` (e.g. "a)") rendered as a bold, left-aligned title at the same height as the centred title. * verification_plot_metrics.py passes `param` so single-panel plots pick up units automatically. * verification_plot_metrics_multipanel.py: numbers panels a), b), ..., in row-major order; replaces tight_layout with explicit subplots_adjust margins so inter-panel hspace/wspace are honoured and the bottom legend has guaranteed room.
Pick up MultipanelPanelSpec, MultipanelPlotSpec, and the new multipanel_plots field on ConfigModel. Generated via `python src/evalml/config.py workflow/tools/config.schema.json`.
Source -> color is now bijective and stable across the dashboard, the single-panel plots, and the multipanel plots. * src/plotting/source_colors.py: TABLEAU10 palette and a source_color_map() helper that assigns colors over the alphabetically-sorted full source list. Wraps past 10 sources; switch palettes if that becomes an issue in practice. * plot_panel grows an optional color_map arg. Both single-panel and multipanel scripts build one map from all_df["source"].unique() and pass it down. The matplotlib-only "analysis = black" override is dropped to match the dashboard. * resources/report/dashboard/script.js pins the color scale's domain to the full source list and uses Vega-Lite's "tableau10" scheme, so toggling sources in the UI no longer reshuffles colors.
Cover panel-level defaults (region/season/init_hour, title, ylim), the row*cols == len(panels) validator, the extra="forbid" guard on both models, and that ConfigModel.multipanel_plots defaults to an empty dict and round-trips a named layout.
Working config exercising the new multipanel_plots feature: stage_E_realch1 vs stage_E_icon_1km_cutoff_edges_subgrid_horography against ICON-CH1/CH2 baselines, with a BIAS-by-season and an RMSE-by-init-hour 2x2 layout. Serves as a copy-paste starting point for new layouts.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds a configurable multi-panel metric-vs-lead-time plot, driven by a new
multipanel_plots:section in the experiment YAML, plus quality-of-life improvements that also benefit the existing single-panel plots and the dashboard.What you get
multipanel_plots:config section. Each named entry declaresrows,cols, optionalfigsize/title, and a row-major list of panels. Each panel is{metric, param}plus optionalregion,season,init_hour,title,ylim. Validated byMultipanelPlotSpec(Pydantic):rows*colsmust equallen(panels), unknown fields are forbidden.verification_metrics_multipanel_plotproducingoutput/results/<experiment>/multipanel/<name>.png. Aggregator ruleverification_metrics_multipanel_plot_allexpands over every named layout. Wired intoexperiment_all, so plainevalml experimentbuilds the layouts when the section is present and is a no-op otherwise.(metric, param)for matplotlib plots (single-panel and multipanel). Backed by a smallPARAM_UNITSmap insrc/plotting/units.py. Known limitation: the verification netCDFs don't carryunitsattrs, so this is a hard-coded dict — see follow-ups below.tableau10over the alphabetically-sorted full source list. Pinning Vega'scolor.scale.domainto that list also fixes a longstanding dashboard glitch where toggling sources reshuffled the remaining colors.src/verification/loading.py(load_long_df,subset_df,_ensure_unique_lead_time,_select_best_sources) andsrc/plotting/metric_lead_time_panel.py(plot_panel). The existing single-panel script and the dashboard script were updated to consume these; no behaviour change.tests/unit/test_config.pycovering panel-level defaults,rows*colsvalidation,extra: "forbid", and aConfigModelround-trip withmultipanel_plotspopulated.Example
config/multipanel_example.yamlis a working config with two layouts:bias_overview(BIAS by season —allvsJJA— for T_2M and PMSL) andrmse_overview(RMSE by init hour —0,12— for T_2M and PMSL). Run withevalml experiment config/multipanel_example.yaml, which produces single-panel plots, dashboard, and multipanel PNGs all in one go.Test plan
pytest tests/unit— 44 passing (5 new).pydantic-schemahook clean (schema regenerated).--rerun-triggers mtimeagainst a freshmultipanel_plotsconfig: only multipanel/plot/dashboard jobs planned, no spurious inference reruns.dashboard.html; colors agree across the three surfaces; dashboard source toggle preserves the remaining sources' colors.Caveats / follow-ups
PARAM_UNITSinsrc/plotting/units.pyis a hand-maintained dict because the metric computation insrc/verification/__init__.py:verifydrops theunitsattrs that the input data carries. A cleaner fix is to propagateunitsthrough the verify → aggregate chain so the netCDF carries them; left as a follow-up.tableau10); beyond that we wrap and lose bijectivity. Easy to bump totableau20later, or to a deterministic HSV ramp, if we ever have >10 sources.output/is gitignored, so a fresh clone needs a fullevalml experimentrun. Fine to do a small experiment for development, can still see about a full experiment on truly independent test data (2026) once we polish for the paper.