Conversation
Batch conversion runs over large archives commonly hit a handful of
per-message defects (e.g. producer-side metadata such as
numberOfForecastsInEnsemble=0 while number!=0, or fields that no
recipe can encode). Previously any single such message aborted the
whole run.
- Add --on-error {abort|log-and-skip|skip}, defaulting to
log-and-skip, wrapping the per-message conversion body in a
try/catch and reporting a skipped-message tally at the end.
- Add --default-ensemble-size <N>: when numberOfForecastsInEnsemble
is 0 but number!=0, substitute N instead of throwing. Default 0
preserves the previous strict behaviour.
- Thread defaultEnsembleSize through mapGrib1ToGrib2 as a new
trailing parameter (default 0) so existing call sites are
unaffected.
These flags make the tool usable in large parallel conversion
pipelines while keeping the safe strict default available.
If misc.lengthOfTimeWindow is not set from the standard keys, read the GRIB1 local-section key lengthOf4DvarWindow (hours) from the input handle and use it, skipping the 0xFFFF missing sentinel. Fixes cases where 4D-Var analysis fields produced GRIB2 with a missing lengthOf4DvarWindow.
For mars type=eme (4D-var model errors, EDA long-window 4Dvar system) the matching GRIB2 local definition 39 requires numberOfComponents and modelErrorType in addition to the standard window keys. These two values have no MARS equivalent and must be forwarded from the input GRIB1 handle (which already carries local definition 39). Adds the corresponding misc EntryDefs and MiscRecord slots, and populates them from the input handle in mapGrib1ToGrib2 when mars.type == "eme". The metkit encoder now consumes them via the new AnalysisType::ModelErrors variant.
Hybrid-level messages whose source GRIB carried no vertical-coordinate
parameters (NV=0, e.g. lnsp on ml) previously round-tripped with a
spurious PV137 (1002-double) array because the metkit encoder's
resolve_PvArray_or_throw fallback fires on an empty par dict.
Forward an explicit suppression signal:
- Add PVPresent to MiscRecord so it round-trips through
dumpUnscopedRecord into par["PVPresent"].
- In mapGrib1ToGrib2, when the input handle has no pv key and reports
NV=0, set misc.pvPresent=false. The matching metkit change honours
this flag in LevelOp StageAllocate and emits PVPresent=0 with no
pv key.
Verified end-to-end on an:ml:12 lnsp sh: output NV=0 preserved (was
NV=1002). No regression on hybrid messages with valid pv (NV=276).
NOTE (revisit): pairs with the metkit-side commit honouring
par["PVPresent"]=false. Once both land we can revisit whether
resolve_PvArray_or_throw should still default to PV137 on an empty par.
…erator=9.9 sentinel Some IFS-produced spectral_complex messages (notably SPP random fields, type=al, paramId 213131-213160) carry laplacianOperator=9.9 as an IFS sentinel without actually pre-laplacian-scaling the stored coefficients. Re-encoding such messages causes ecCodes' grib_ieee_to_long to assert because (T*(T+1))^9.9 multiplied by the raw coefficients overflows IEEE32 (FLT_MAX). The assertion is unrecoverable from inside ecCodes and bypasses the existing log-and-skip path, aborting the whole tool. Reject these messages up front with eckit::BadValue. The existing top-level catch in grib1-to-grib2.cc treats BadValue as recoverable: - --on-error=abort: terminates with "Bad value:" prefix - --on-error=log-and-skip: reports "skipped 1 message(s)", exits 0 SeriousBug would have been semantically wrong here: the precondition violation is both expected (legacy IFS encoding bug) and recoverable. Verified by encoding fc:al:3 reproducer (180 msgs): 90 real laplacian messages (paramIds 213101-213130) succeed; 90 sentinel messages are skipped cleanly under --on-error=log-and-skip. NOTE (revisit): upstream IFS encoding bug. Once IFS stops emitting sentinel laplacianOperator=9.9 (or sets it to 0 when no scaling was actually applied), this guard becomes dead code.
The existing ensemble branch requires numberOfForecastsInEnsemble (absent for eme) to set mars.number. The encoder's analysis ModelErrors path needs mars.number to deduce componentIndex; forward it unconditionally inside the eme block.
There was a problem hiding this comment.
Pull request overview
This PR enhances the grib1-to-grib2 conversion tool’s resilience by adding per-message error handling and additional metadata/validation logic to avoid hard failures in known problematic GRIB encodings, plus new GRIB sample templates used by multio test/config tooling.
Changes:
- Add per-message error handling (
--on-error) and a pre-encode guard for knownspectral_complexlaplacian-scaling overflow cases. - Improve metadata forwarding/extraction (explicit PV absence via
pvPresent, support fortype=emelocal definition 39 keys, and alengthOf4DvarWindowfallback). - Add additional unstructured GRIB template samples under
tests/multio/config/grib-samples/samples.unstructured/.
Reviewed changes
Copilot reviewed 3 out of 12 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
src/multio/tools/grib1-to-grib2.cc |
Adds per-message error handling, overflow validation, and multiple metadata extraction/forwarding improvements. |
src/multio/datamod/MarsMiscGeo.h |
Extends MiscRecord with additional optional entries (numberOfComponents, modelErrorType, pvPresent). |
src/multio/datamod/GribKeys.h |
Introduces optional GRIB key defs for local definition 39 (numberOfComponents, modelErrorType). |
tests/multio/config/grib-samples/samples.unstructured/unstr_latlon.tmpl |
Adds new unstructured GRIB sample template. |
tests/multio/config/grib-samples/samples.unstructured/unstr_inst_pf.tmpl |
Adds new unstructured GRIB sample template. |
tests/multio/config/grib-samples/samples.unstructured/unstr_inst_fc.tmpl |
Adds new unstructured GRIB sample template. |
tests/multio/config/grib-samples/samples.unstructured/unstr_inst_an.tmpl |
Adds new unstructured GRIB sample template. |
tests/multio/config/grib-samples/samples.unstructured/unstr_avg_tpa.tmpl |
Adds new unstructured GRIB sample template. |
tests/multio/config/grib-samples/samples.unstructured/unstr_avg_pf.tmpl |
Adds new unstructured GRIB sample template. |
tests/multio/config/grib-samples/samples.unstructured/unstr_avg_fc.tmpl |
Adds new unstructured GRIB sample template. |
tests/multio/config/grib-samples/samples.unstructured/unstr_avg_an.tmpl |
Adds new unstructured GRIB sample template. |
tests/multio/config/grib-samples/samples.unstructured/unstr_avg.tmpl |
Adds new unstructured GRIB sample template. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| } | ||
| } | ||
|
|
||
| if (skippedCount > 0) { |
There was a problem hiding this comment.
--on-error=skip is documented as “silently skip failing messages”, but the tool still emits the end-of-run summary ("grib1-to-grib2: skipped ...") whenever skippedCount > 0. Either suppress this summary when onErrorHandling_ == OnErrorHandling::Skip, or update the option help text to state that a summary will still be printed.
| if (skippedCount > 0) { | |
| if (skippedCount > 0 && onErrorHandling_ != OnErrorHandling::Skip) { |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## develop #267 +/- ##
===========================================
- Coverage 61.40% 61.26% -0.14%
===========================================
Files 320 320
Lines 21854 21708 -146
Branches 1655 1674 +19
===========================================
- Hits 13419 13300 -119
+ Misses 8435 8408 -27 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
… for type=me Extend the existing type=eme forwarding block to also fire for type=me (model errors). Both types use GRIB1 local definition 39 with componentIndex (= mars.number), numberOfComponents and modelErrorType. The metkit encoder's analysis ModelErrors path now handles both.
…ations for type=4i
For type=4i (4D-var analysis iterations) the GRIB1 input carries local
definition 38 with iterationNumber and totalNumberOfIterations.
iterationNumber is now deduced by metkit's IterationConcept from
mars.iteration (= ECMWF MARS keyword "iteration"), so this tool only
needs to populate that mars key. totalNumberOfIterations has no MARS
representation and is forwarded via the parameter dictionary.
Concretely:
* mars.iteration = dm::parseEntry(dm::ITERATION, h) is set after
mars.anoffset.
* The misc data model loses the IterationNumber EntryDef
(GribKeys.h) and the iterationNumber field (MarsMiscGeo.h);
multio no longer forwards iterationNumber explicitly.
* The new type=4i block in grib1-to-grib2.cc forwards only
totalNumberOfIterations.
Pairs with metkit commit
mars2grib: Refactor type=4i/eme/me handling into orthogonal Iteration,
ModelErrors and Analysis concepts
which adds the IterationConcept that consumes mars.iteration.
fc10c74 to
99b2261
Compare
Description
Contributor Declaration
By opening this pull request, I affirm the following:
🌈🌦️📖🚧 Documentation 🚧📖🌦️🌈
https://sites.ecmwf.int/docs/dev-section/multio/pull-requests/PR-267