fix crash when debugging#173
Conversation
In debug mode, Intel Fortran raises error 65 (floating invalid) when comparing NaN values (e.g. NaN >= 0.). Input scaling parameter files use NaN as a no-data indicator, which causes comparisons like sclprm_std_obs >= 0. to raise FP exceptions before returning false. Fix: Add NaN guards using the standard Fortran idiom (val == val is .false. for NaN) before the existing range checks at all 3 guard sites in clsm_ensupd_read_obs.F90 (~line 9755, ~9989, ~10384).
When N_files=1, lon_min_vec(2) and lon_max_vec(2) remain initialized to MAPL_UNDEF (1.0e15). Whole-array operations like: start_ind = (lon_min_vec - CMG_ll_lon) / CMG_dlon produce ~2e16 which overflows a 32-bit integer, causing forrtl: error (65): floating invalid in debug mode. Fix: Restrict array operations to (1:N_files) elements only: start_ind(1:N_files) = (lon_min_vec(1:N_files) - CMG_ll_lon)/CMG_dlon last_ind(1:N_files) = (lon_max_vec(1:N_files) - CMG_ll_lon)/CMG_dlon N_lon_vec(1:N_files) = last_ind(1:N_files) - start_ind(1:N_files) + 1
gmao-rreichle
left a comment
There was a problem hiding this comment.
@weiyuan-jiang, many thanks for the PR. I have a couple of comments/questions below.
| ! when in debug mode, nint(VEGCLS) with 1.0e15 may crash | ||
| allocate(tmpR(N_catl)) | ||
| tmpR = VEGCLS(:) | ||
| where(tmpR > 1.0e10) tmpR = nodata_generic | ||
| mwRTM_param(:)%vegcls = nint(tmpR(:)) | ||
| tmpR = SOILCLS(:) | ||
| where(tmpR > 1.0e10) tmpR = nodata_generic | ||
| mwRTM_param(:)%soilcls = nint(tmpR(:)) | ||
|
|
There was a problem hiding this comment.
@weiyuan-jiang : I'm not sure if we need this, and if we do, I'm not sure it's the best fix. First, I'm pretty confident that vegcls and soilcls here should never be no-data. So a better way of addressing this might be to check for (native) no-data-values first and stop if any are encountered.
If I'm wrong and no-data-values for the "integer" fields could happen, we might need to figure out a more robust solution. The current solution depends on "nodata_generic" being -9999. I was hoping that at some point in the future we can use MAPL_UNDEF instead of LDAS having its own nodata-value. But if nodata_generic turns into 1e15, the currently implemented solution won't work, I think.
There was a problem hiding this comment.
When I debugged, I printed the values and I did see 1.0e15. When it is converted to int, it becomes the most negative integer. Here our nondata_generic is -9999.0
| if ( sclprm_mean_obs(ind)==sclprm_mean_obs(ind) .and. & | ||
| sclprm_mean_mod(ind)==sclprm_mean_mod(ind) .and. & | ||
| sclprm_std_obs(ind) ==sclprm_std_obs(ind) .and. & | ||
| sclprm_std_mod(ind) ==sclprm_std_mod(ind) .and. & |
There was a problem hiding this comment.
Why do we need these four lines, which should always be true? Not sure where they're coming from and what they might be meant to do.
Or do they evaluate to "false" if a value is NaN? If this is the intent, we should at least add a comment to clarify. Although I don't think the reader for the "sclprm_*" values would produce NaNs.
Alternatively, perhaps check more explicitly for nodata values?
I guess all of this depends on the nodata-value here being -9999. If the nodata-value is 1e15, it wouldn't be identified by any of the if-conditions.
There was a problem hiding this comment.
I think that comparison is for Nan. If Nan == Nan is always .false. . In debugging mode, comparing Nan may lead to crash
| if ( sclprm_mean_obs(j_ind, i_ind)==sclprm_mean_obs(j_ind, i_ind) .and. & | ||
| sclprm_mean_mod(j_ind, i_ind)==sclprm_mean_mod(j_ind, i_ind) .and. & | ||
| sclprm_std_obs(j_ind, i_ind) ==sclprm_std_obs(j_ind, i_ind) .and. & | ||
| sclprm_std_mod(j_ind, i_ind) ==sclprm_std_mod(j_ind, i_ind) .and. & |
| if ( sclprm_mean_obs(ind)==sclprm_mean_obs(ind) .and. & | ||
| sclprm_mean_mod(ind)==sclprm_mean_mod(ind) .and. & | ||
| sclprm_std_obs( ind)==sclprm_std_obs( ind) .and. & | ||
| sclprm_std_mod( ind)==sclprm_std_mod( ind) .and. & |
|
This PR is being prevented from merging because you have added one of our blocking labels: Contingent - DNA, Needs Lead Approval, Contingent -- Do Not Approve. You'll need to remove it before this PR can be merged. |
This PR fixes the run in debug mode. It is 0-diff. It it goes with this PR in NCEP_Shared, GEOSldas can run on debug mode with land assimilation
Related PRs:
GEOS-ESM/NCEP_Shared#37