Skip to content

gsi.x aborts when processing 20250400712 or later #4028

@RussTreadon-NOAA

Description

@RussTreadon-NOAA

What is wrong?

gsi.x aborts when processing data for 2025040712 or later due to CRTM coefficients for abi_g19 not being available.

What should have happened?

gsi.x should successfully run to completion when abi_g19 data is available.

What machines are impacted?

All or N/A

What global-workflow hash are you using?

1644a53

Steps to reproduce

  1. clone and install g-w develop at 1644a53
  2. set up an experiment that include 2025040712
  3. start the parallel and cycle to 2025040712

The 2025040712 gdas_anal will fail with an error similar to the one below

131:  CRTM_SpcCoeff_Load(FAILURE) : Error reading SpcCoeff file #1, ./crtm_coeffs/abi_g19.SpcCoeff.bin; \
Process ID: 131
131:  CRTM_Init(FAILURE) : Error loading SpcCoeff data; Process ID: 131
131:  crtm_interface*init_crtm:  ***ERROR*** crtm_init error_status=           3
131:     TERMINATE PROGRAM EXECUTION
131: Abort(71) on node 131 (rank 131 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 71) - proc\
ess 131
  0: slurmstepd: error: *** STEP 15558125.1 ON h2c06 CANCELLED AT 2025-09-06T09:54:54 ***

Additional information

abi_g19 data began being dumped in operations starting 2025040712. The global_satinfo.txt associated with g-w gsi_enkf.fd @ 9ae4063 assimilates abi_g19 data.

When abi_g19 data is present, gsi.x attempts to assimilate the data. Attempts to run the CRTM for abi_g19 abort with

131:  CRTM_SpcCoeff_Load(FAILURE) : Error reading SpcCoeff file #1, ./crtm_coeffs/abi_g19.SpcCoeff.bin; \
Process ID: 131
131:  CRTM_Init(FAILURE) : Error loading SpcCoeff data; Process ID: 131
131:  crtm_interface*init_crtm:  ***ERROR*** crtm_init error_status=           3
131:     TERMINATE PROGRAM EXECUTION
131: Abort(71) on node 131 (rank 131 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 71) - proc\
ess 131
  0: slurmstepd: error: *** STEP 15558125.1 ON h2c06 CANCELLED AT 2025-09-06T09:54:54 ***
srun: Job step aborted: Waiting up to 32 seconds for job step to finish.
srun: error: h17c19: tasks 72-75: Killed

Do you have a proposed solution?

WCOSS2 has module crtm/2.4.0.2. The ${CRTM_FIX} associated with this module include abi_g19 coefficients.

Is crtm/2.4.0.2 or crtm-fix/2.4.0.2 available on NOAA RDHPCS machines?

Metadata

Metadata

Labels

No labels
No labels

Type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions