Skip to content

output at flexible time levels#217

Draft
guoqing-noaa wants to merge 1 commit intoufs-community:gsl/developfrom
guoqing-noaa:flexible_timelevels4gsl
Draft

output at flexible time levels#217
guoqing-noaa wants to merge 1 commit intoufs-community:gsl/developfrom
guoqing-noaa:flexible_timelevels4gsl

Conversation

@guoqing-noaa
Copy link
Collaborator

@guoqing-noaa guoqing-noaa commented Feb 24, 2026

Introduce a new output_timelevels attribute for MPAS streams that enables variable output intervals.

With this capability, we may outoput every 15 minutes in the first hour, every hour in the first 3 days, every 3 hours for the next 4 days, and every 6 hours in the last 3 days.

We can also use this to only write out forecast files during a given period, such as: output_timelevels="6-12h"

Check the PR description below for details on how to specify the time levles.
Here is a quick example: output_timelevels="0-3-15m 4-72 75-168-3 174-240-6"

FYI, with this PR, we may write out mpasout files at "0 1 2 3 6 9 12", while history files at "4 5 7 8 11 13-99999" to avoid duplicate output of both mpasout and history files at the same time levels, which are almost the same.

This PR solves issue #214

Priority Reviewers

…ttribute

Introduce a new `output_timelevels` attribute for MPAS streams that enables variable output intervals.
With this capability, we may outoput every 15 minutes in the first hour, every hour in the first 3 days, every 3 hours for the next 4 days, and every 6 hours in the last 3 days.
We can also use this to only write out forecast files during a given period, such as: output_timelevels="6-12h"

Check the PR description for details on how to specify the time levles.
Here is a quick example: output_timelevels="0-3-15m 4-72 75-168-3 174-240-6"
@guoqing-noaa
Copy link
Collaborator Author

output_timelevels Specification

The forecast output times are defined by a space-separated list of sections:

<section> [section] [section] ...

Each section expands into one or more forecast times.
The final output is the union of all expanded times.
Users need to make sure the times are in a ascending order.

1. Time String Format

A time_string represents a forecast time (offset from the initialization time).
It may be written in one of the following forms:

1.1 Integer Forms (Hours)

An integer with no suffix represents hours:

6     → 6 hours
12    → 12 hours
0     → 0 hours

1.2. Integer With Unit Descriptors

A duration may be written using unit suffixes:

h  → hours
m  → minutes
s  → seconds
D  → days

Examples:

1h30m   → 1 hour 30 minutes
45m     → 45 minutes
90s     → 90 seconds
6h15m   → 6 hours 15 minutes

2. Sequence Expansion (Range Expression)

A section may define a regularly spaced sequence using:
start-stop-step
This generates values beginning at start, incremented by step, and ending at stop (inclusive if exactly reached).
Examples:

0-1h-15m

expands to:

0 15m 30m 45m 1h

If -step is omitted, a default step of 1 hour is used:
7-12 means 7-12-1 and expands to 7 8 9 10 11 12

Note: Integer and unit-based forms may be freely mixed

3. The EBNF (Extended Backus-Naur Form) style grammar:

specification   = section , { SP , section } ;

section         = range
                | time_string ;

range           = time_string , "-" , time_string , [ "-" , time_string ] ;
                  (* start-stop[-step] *)

time_string     = integer
                | duration ;

duration        = duration_part , { duration_part } ;

duration_part   = integer , unit ;

unit            = "h" | "m" | "s" | "D" ;

integer         = digit , { digit } ;

digit           = "0" | "1" | "2" | "3" | "4"
                | "5" | "6" | "7" | "8" | "9" ;

SP              = " " , { " " } ;

@guoqing-noaa
Copy link
Collaborator Author

guoqing-noaa commented Feb 24, 2026

The following situations have been tested:
Note: extra spaces were intentionally added to test whether this PR can handle them corrently.

output_timelevels="0  1     "
output_timelevels="0h  1h   "

output_timelevels="0-1-15m 2-6-2"
output_timelevels="2-6-2"
output_timelevels="2-6-2   8-999"
output_timelevels="2-6h-2   8-999h"

output_timelevels="0m 15m  1h    1h3m"
output_timelevels="0s 30s 15m  1h    1h3m"
output_timelevels="0m 30s 15m  1h    1h3m"

@guoqing-noaa guoqing-noaa changed the title Enable variable output intervals via the output_timelevels stream attrbute output at flexible time levels Feb 24, 2026
@guoqing-noaa
Copy link
Collaborator Author

guoqing-noaa commented Feb 24, 2026

For a conus12km 12 hours forecasts, I tried different settings for the da_state stream (no changes on the history and diag outputs)

the output_interval="1:00:00" run took 606s
the output_timelevels="0-999" run took 609s
This 3s cost saves more time due to the skipping of mpasout output at hours 4-12:
  the output_timelevels="0-3" run took 549s

Copy link
Collaborator

@SamuelTrahanNOAA SamuelTrahanNOAA left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These code changes are overcomplicated and will be difficult to extend or maintain. Any significant changes to the functionality will require a complicated rewrite.

More specific overview comments:

  1. Most of the parser code is a reimplementation of the Fortran standard split function. It would be better to call split.
  2. Much of the time-related code reimplements the MPAS and ESMF alarm functionality. It would be better to extend the alarm functionality.
  3. Parser errors aren't reported. If you type something wrong, you don't know what it was.
  4. There are out-of-bounds array reads due to off-by-one errors.
  5. The new syntax deviates from the MPAS days_HH:MM:SS time specification.
  6. Unlike the rest of the MPAS I/O interface, this uses a cryptic string instead of XML tags. If you used XML tags, the parser code wouldn't be necessary at all.

Comment on lines +6012 to +6223
!|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
!
! parse_time_string
!
!> \brief Parse a time string into total minutes ( allow 0.5 minutes, i.e. 30 seconds)
!> \author Guoqing Ge
!> \date February 2026
!> \details
!> Parses time strings like "1h30m", "45m", "6", "90s", "1d" into total minutes
!> Supported units: d/D (days), h (hours), m (minutes), s (seconds)
!> Plain integers without units are interpreted as hours
!
!-----------------------------------------------------------------------
subroutine parse_time_string(time_str, total_minutes, ierr)!{{{

implicit none

character(len=*), intent(in) :: time_str
real(kind=RKIND), intent(out) :: total_minutes
integer, intent(out) :: ierr

character(len=StrKIND) :: str_local
integer :: i, len_str, num_start, read_err
real(kind=RKIND) :: value
character :: ch
logical :: found_unit

ierr = 0
total_minutes = 0.0_RKIND
str_local = trim(adjustl(time_str))
len_str = len_trim(str_local)

if (len_str == 0) then
ierr = 1
return
end if

! Check if last character is a digit (plain integer = hours)
ch = str_local(len_str:len_str)
if (ch >= '0' .and. ch <= '9') then
read(str_local, *, iostat=read_err) value
if (read_err /= 0) then
ierr = 1
return
end if
total_minutes = value * 60.0_RKIND
return
end if

! Parse duration with units (e.g., "1h30m", "45m", "90s", "1d" or "1D")
num_start = 1
i = 1
do while (i <= len_str)
ch = str_local(i:i)
! Check for unit characters: d/D (days), h (hours), m (minutes), s (seconds)
if (ch == 'h' .or. ch == 'm' .or. ch == 's' .or. ch == 'd' .or. ch == 'D') then

if (i == num_start) then
ierr = 1 ! No number before unit
return
end if

read(str_local(num_start:i-1), *, iostat=read_err) value
if (read_err /= 0) then
ierr = 1
return
end if

select case (ch)
case ('h')
total_minutes = total_minutes + value * 60.0_RKIND
case ('m')
total_minutes = total_minutes + value
case ('s')
total_minutes = total_minutes + value / 60.0_RKIND
case ('d', 'D')
total_minutes = total_minutes + value * 24.0_RKIND * 60.0_RKIND
end select

num_start = i + 1
end if
i = i + 1
end do

! Check for trailing digits without unit (error)
if (num_start <= len_str) then
ierr = 1
return
end if

end subroutine parse_time_string!}}}


!|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
!
! parse_output_timelevel_spec
!
!> \brief Parse one time level specification segment
!> \author Guoqing Ge
!> \date February 2026
!> \details
!> Parses a single segment using format: start, start-stop, or start-stop-step
!> Time strings can be integers (hours) or duration format (e.g., "1h30m", "45m")
!> Examples: "6", "0-3", "0-1h-15m", "1h30m-2h-15m"
!
!-----------------------------------------------------------------------
subroutine parse_output_timelevel_spec(spec, start_hour, end_hour, interval_minutes, ierr)!{{{

implicit none

character(len=*), intent(in) :: spec
real(kind=RKIND), intent(out) :: start_hour, end_hour
real(kind=RKIND), intent(out) :: interval_minutes
integer, intent(out) :: ierr

character(len=StrKIND) :: spec_local
integer :: i, dash_count, dash_pos(2), len_spec
real(kind=RKIND) :: start_minutes, end_minutes, step_minutes
integer :: local_ierr

ierr = 0
dash_pos(1) = 0
dash_pos(2) = 0
spec_local = trim(adjustl(spec))
len_spec = len_trim(spec_local)

! Guard against empty string
if (len_spec == 0) then
ierr = 1
return
end if

! Count dashes to determine format
dash_count = 0
do i = 1, len_spec
if (spec_local(i:i) == '-') then
dash_count = dash_count + 1
if (dash_count <= 2) dash_pos(dash_count) = i
end if
end do

! Parse based on number of dashes
if (dash_count == 0) then
! Format: single time string (output at that time only)
call parse_time_string(spec_local, start_minutes, local_ierr)
if (local_ierr /= 0) then
ierr = 1
return
end if
start_hour = start_minutes / 60.0_RKIND
end_hour = start_hour
interval_minutes = 60.0_RKIND ! Default, but won't matter for single time

else if (dash_count == 1) then
! Format: start-stop (interval defaults to 1 hour)
if (dash_pos(1) <= 1 .or. dash_pos(1) >= len_spec) then
ierr = 1
return
end if

call parse_time_string(spec_local(1:dash_pos(1)-1), start_minutes, local_ierr)
if (local_ierr /= 0) then
ierr = 1
return
end if

call parse_time_string(spec_local(dash_pos(1)+1:len_spec), end_minutes, local_ierr)
if (local_ierr /= 0) then
ierr = 1
return
end if

start_hour = start_minutes / 60.0_RKIND
end_hour = end_minutes / 60.0_RKIND
interval_minutes = 60.0_RKIND

else if (dash_count == 2) then
! Format: start-stop-step
if (dash_pos(1) <= 1 .or. dash_pos(2) <= dash_pos(1) + 1 .or. dash_pos(2) >= len_spec) then
ierr = 1
return
end if

call parse_time_string(spec_local(1:dash_pos(1)-1), start_minutes, local_ierr)
if (local_ierr /= 0) then
ierr = 1
return
end if

call parse_time_string(spec_local(dash_pos(1)+1:dash_pos(2)-1), end_minutes, local_ierr)
if (local_ierr /= 0) then
ierr = 1
return
end if

call parse_time_string(spec_local(dash_pos(2)+1:len_spec), step_minutes, local_ierr)
if (local_ierr /= 0) then
ierr = 1
return
end if

start_hour = start_minutes / 60.0_RKIND
end_hour = end_minutes / 60.0_RKIND
interval_minutes = step_minutes

else
! More than 2 dashes - invalid format
ierr = 1
return
end if

end subroutine parse_output_timelevel_spec!}}}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an extraordinary amount of code to parse a simple string. The vast majority of it could be replaced with a call to split

return
end if

! First pass: find if current forecast hour matches any segment
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You do not need to reparse the output_timelevel every time you reach a new forecast time. The output_timelevel doesn't change. Instead, parse it once and reuse the information.

seg_end = seg_end + 1
end do
seg_end = seg_end - 1

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should use split instead of reimplementing it.

end do
seg_end = seg_end - 1

if (seg_end >= seg_start) then
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a check for an invalid condition which could be detected at the beginning of the model when the XML is parsed. Users shouldn't have to wait for the first forecast output time to see that their XML is invalid.

if (forecast_hour >= start_hour .and. forecast_hour <= end_hour) then
found = .true.
! Check if this is a single time point (not a range)
if (abs(start_hour - end_hour) < 1.0e-6_RKIND) then
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The parse_output_timelevel_spec already knows whether this is a single time. Why do you need to check it again? Can't you return that information?

! More than 2 dashes - invalid format
ierr = 1
return
end if
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should print an error message so the user knows what they did wrong.

end if

seg_start = seg_end + 1
do while (seg_start <= work_len .and. work_str(seg_start:seg_start) == ' ')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out-of-bounds array read due to an off-by-one bug. This will access the character at work_len+1 because Fortran doesn't have a short-circuit .and. operator.

Comment on lines +6308 to +6317
! If we matched a single time, find the next output time
if (found .and. is_single_time) then
! Second pass: find the minimum time > forecast_hour
seg_start = 1
do while (seg_start <= work_len)
seg_end = seg_start
do while (seg_end <= work_len .and. work_str(seg_end:seg_end) /= ' ')
seg_end = seg_end + 1
end do
seg_end = seg_end - 1
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're parsing the next output time twice: once for checking the current output time, and again when checking the next output time.

If you had put the information in a derived type data structure, you wouldn't need this complexity.

end subroutine stream_mgr_add_pkg_c !}}}


subroutine stream_mgr_set_property_c(manager_c, streamID_c, propertyName_c, propertyValue_c, ierr_c) bind(c) !{{{
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This subroutine is unnecessary. The output_interval should be sent to stream_mgr_create_stream_c

Comment on lines +6262 to +6266
! Guard against empty string
if (work_len == 0) then
ierr = 1
return
end if
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should print an error so the user knows what they did wrong.

@SamuelTrahanNOAA
Copy link
Collaborator

The EBNF (Extended Backus-Naur Form) style grammar

If you want to use a parser generator, the original grammar specification should be in the repository, not the automatically-generated code.

@guoqing-noaa
Copy link
Collaborator Author

The EBNF (Extended Backus-Naur Form) style grammar

If you want to use a parser generator, the original grammar specification should be in the repository, not the automatically-generated code.

No, I don't intend to use a parser generator. I put EBNF there to make sure the overall timelevel specification logic is consistent and complete, and there are no hidden surprises.

@guoqing-noaa
Copy link
Collaborator Author

guoqing-noaa commented Feb 24, 2026

These code changes are overcomplicated and will be difficult to extend or maintain. Any significant changes to the functionality will require a complicated rewrite.

More specific overview comments:

  1. Most of the parser code is a reimplementation of the Fortran standard split function. It would be better to call split.
  2. Much of the time-related code reimplements the MPAS and ESMF alarm functionality. It would be better to extend the alarm functionality.
  3. Parser errors aren't reported. If you type something wrong, you don't know what it was.
  4. There are out-of-bounds array reads due to off-by-one errors.
  5. The new syntax deviates from the MPAS days_HH:MM:SS time specification.
  6. Unlike the rest of the MPAS I/O interface, this uses a cryptic string instead of XML tags. If you used XML tags, the parser code wouldn't be necessary at all.

@SamuelTrahanNOAA Thanks for the comments and discussions!

This is the first version to get things work. I also feel the implementation is kind of complicated and I would like to descope to meet our current needs only. My thought is to limit the timelevel specifications to only support minutes and hours (may also limit the output to have to start at 0h).

I intentionally try to avoid using the same time format HH:MM:SS used in output_interval.
start-stop[-step] is the most appropriate format after lots of considerations. It is more user friendly. start-stop is used in lots of industry applications (I just added the [-step] part)

@SamuelTrahanNOAA
Copy link
Collaborator

I put EBNF there to make sure the overall timelevel specification logic is consistent and complete, and there are no hidden surprises.

It is, indeed, a finely-polished EBNF with perfect indentation. However, it doesn't describe the code. Your read statement reads a real, not an integer.

@SamuelTrahanNOAA
Copy link
Collaborator

start-stop[-step] is the most appropriate format after lots of considerations. It is more user friendly. start-stop is used in lots of industry applications (I just added the [-step] part)

What is more user-friendly is splitting them into individual XML tags so it is clear what is going on. That would use the existing MPAS parsing code (eliminating the parser). Also, it'll be less confusing to experienced MPAS users.

<output_interval start="01:15:00" stop="02:30:00" step="00:15:00"/> <!-- every 15 minutes from 1:15 to 2:30 -->
<output_interval start="06:00:00"/> <!-- Only at hour 6 -->

I can see how your string would be easier to pass through the rrfs-workflow bash scripts, but bash limitations shouldn't take precedence over MPAS code consistency and quality.

@SamuelTrahanNOAA
Copy link
Collaborator

Can you please provide:

  1. Documentation of what part of the code was AI-generated. (Via code comments, perhaps.)
  2. Documentation of what part of the PR description and comments were AI-generated.
  3. Explanation of how it was generated.
  4. License restrictions the AI company placed on its generated code, description, and comments.

@guoqing-noaa
Copy link
Collaborator Author

@clark-evans I am sorry, this may not be ready for an official review yet. I should have put this in the draft mode since the beginning.

@guoqing-noaa guoqing-noaa marked this pull request as draft February 24, 2026 21:05
@guoqing-noaa
Copy link
Collaborator Author

<output_interval start="01:15:00" stop="02:30:00" step="00:15:00"/> <!-- every 15 minutes from 1:15 to 2:30 -->
<output_interval start="06:00:00"/> <!-- Only at hour 6 -->

This looks good. Does current MPAS code support this?

@SamuelTrahanNOAA
Copy link
Collaborator

This looks good. Does current MPAS code support this [xml alternative in Sam's prior comment]?

No. I'm suggesting it as an alternative implementation to your recursive descent parser.

@guoqing-noaa
Copy link
Collaborator Author

@SamuelTrahanNOAA Thanks for lots of great discussions! For the moment, this PR is mainly for a test/demo purpose.
Appreciate your specific comments so far. But we may pause further reviewing. This will NOT be the final implementation if moving forward. General discussions may continue if your are interested and have time to think about more on this.

I think my initial need is much simpler that the goal in this PR. We want to output mpasout files only at limited time levels (such as 01, 02 h (or including 0h if needed)).
My thought went much further than needed, mainly driven by my latest relevant work on the rrfs-workflow side.

For our current needs, I think NCAR Andy Stokely's changes may work. I will test that first. I was not aware of Andy's changes until 2 days ago.

@SamuelTrahanNOAA
Copy link
Collaborator

WRF was limited to start...step...end for each stream. We were able to work around that by having multiple output streams. (Recall that the operational models HRRR, HWRF, NAM, and RAP were all WRF-based, along with other quasi-operational models.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants