-
Notifications
You must be signed in to change notification settings - Fork 21
CDA-74 Created ADR for timeseries csv formatting #1634
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Changes from all commits
b8bf554
a20af56
d7d25a7
7151351
738f8f6
c7687e8
44b3324
e6c3ba7
e7315f1
d37491b
6173210
75c5fc1
4b7650e
e08ab4a
3aa46da
81c3168
4bf7a61
7df49c1
450746e
c566fe4
00f933e
36035ea
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||
|---|---|---|---|---|---|---|
| @@ -0,0 +1,123 @@ | ||||||
| ##### | ||||||
| CSV Format for TimeSeries | ||||||
| ##### | ||||||
|
|
||||||
|
|
||||||
| Summary | ||||||
| ======= | ||||||
|
|
||||||
| This ADR defines a standardized CSV representation for TimeSeries. It specifies a row-per-record CSV format that preserves essential metadata and ensures consistent ingestion by analytics, automation, and warehousing systems. | ||||||
|
|
||||||
|
|
||||||
| Opinions | ||||||
| ======== | ||||||
|
|
||||||
| Opinion 1 | ||||||
| --------- | ||||||
|
|
||||||
| @brysonspilman | ||||||
|
|
||||||
| Summary | ||||||
| ~~~~~~~ | ||||||
| Since the intended use of the CSV format is for retrieval only, a customized format that follows standardized csv practices is appropriate. | ||||||
|
|
||||||
| Key points | ||||||
| ~~~~~~~~~~ | ||||||
|
|
||||||
| .. list-table:: | ||||||
| :header-rows: 1 | ||||||
| :widths: 20 25 55 | ||||||
|
|
||||||
| * - Topic | ||||||
| - Decision | ||||||
| - Justification | ||||||
| * - Required columns | ||||||
| - Always include ``date-time`` and ``value``; include units in the value column header as parentheses (e.g., ``value (ft)``) | ||||||
| - Units should exist in exactly one canonical location in all modes. Conditionally adding them as metadata comments will cause confusion over the inconsistency | ||||||
| * - Optional columns | ||||||
| - Optional (off by default): ``time-series-id``, ``office-id``, ``version-date``, ``data-entry-date``, ``quality`` | ||||||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't think We had decided for JSON/XML timeseries, that there has not been a use-case for Should be added somewhere that a future update could include an optional
Suggested change
Collaborator
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This makes sense to me so as to avoid repeated data. The one benefit of using columns is that follows standards (using '#' comments isn't standard). The idea here was to allow for either comments or columns. Will let @MikeNeilson weigh in
Collaborator
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. No need to include those as columns given they can be included in metadata rows.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. might be useful to set the content-disposition to download and set the filename to the timeseries name.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I just though of a valid reason to include a row level version date, data transfer (e.g. test to dev) and inspection. That said it's a rather uncommon use-case so we don't need to worry about it yet in this PR... plus it doesn't really need to be "easy" anyways given it would be a rare use-case.
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Office in column seems more useful to me if you were to to return multiple timeseries, one TS per row and the data be in the columns? (Not that we are, just trying to think when Office might go in a column) |
||||||
| - Everything except ``date-time`` and ``value`` (with units in the header) is optional. Because headers are always included, optional columns can be toggled without breaking parsing. Clients should rely on column names, not indices. Given units are in the `value` header, clients will need to handle this appropriately to determine the correct column index. | ||||||
| * - Metadata fields | ||||||
| - May be emitted as top-of-payload comments (``metadata-format=comments``) or as actual columns (``metadata-format=columns``) | ||||||
| - The following fields can be treated as metadata comments at top-of-payload rather than columns: ``time-series-id``, ``office-id``, ``version-date``. These are optional (off by default). It is assumed that the only comments in the payload will be metadata comments, and as such, clients can parse out metadata by reading comment lines until the first non-comment line is reached. | ||||||
| * - Units location | ||||||
| - Express units only in the value column header via parentheses (e.g., ``value (cfs)``) | ||||||
| - Do not include units as a separate column or in metadata comments. This avoids the anti-pattern of dual representation; units live in exactly one canonical location. Custom deserialization may be required to extract units from the header, which is preferable to duplicate representations. | ||||||
| * - Version-date encoding | ||||||
| - Use ``base`` for 1111-11-11T11:11, ``aggregate`` for aggregate versions, ISO-8601 timestamp for actual version dates, and omit the field if unversioned | ||||||
| - Matches CWMS-VUE behavior. A separate CSV column per case was rejected due to lack of use-cases and schema bloat. Note this requires custom serialization handling. | ||||||
| * - Column headers | ||||||
| - Always include headers | ||||||
| - RFC 4180 allows headers; including them keeps the format scalable if optional columns are introduced later and prevents reliance on fixed column indices. We will include a header param of ``header=present`` in the Accept header to explicitly indicate that headers are included, even though they will always be present. This allows for future flexibility if we ever need to emit headerless CSV for some reason. | ||||||
| * - Comments | ||||||
| - Treat lines beginning with ``#`` as comments | ||||||
| - While not part of RFC 4180, this convention is already used by CWMS endpoints (e.g., office and location-group) that return CSV, and is human-readable. | ||||||
| * - Column naming | ||||||
| - Kebab-case names | ||||||
| - Keeps naming consistent with JSON and XML. | ||||||
| * - Accept header for format and columns | ||||||
| - Use HTTP Accept header parameters to select date format and optional columns | ||||||
| - Default CSV serialization uses ISO-8601 strings. Examples: ``text/csv;date-format=ISO8601-Instant`` (default), ``text/csv;date-format=epoch-millis``. Use Accept header parameters to enable optional columns (e.g., ``quality=present``, ``data-entry-date=present``). If these were query params instead, toggling would be easier in a browser, but Accept keeps content negotiation consistent. | ||||||
| * - Quality representation | ||||||
| - ``quality`` (aka quality-code) is an optional integer bitmask | ||||||
| - A bitmask (integer) compactly represents multiple boolean flags with fast native bitwise operations; a ``byte[]`` adds overhead without improving expressiveness for fixed flag sets. | ||||||
| * - Nulls and missing values | ||||||
| - Missing values will be represented with an empty value field (null) and will have ``quality-code = 5``. Constants will not be used to represent missing values. | ||||||
| - Keeps behavior consistent with JSON and XML. | ||||||
| * - Encoding and delimiters | ||||||
| - UTF-8, comma delimiter, LF line endings | ||||||
| - Comma-only CSV follows RFC 4180 compliance. Tab/Pipe/semicolon delimiters will not be supported. | ||||||
| * - Record structure | ||||||
| - One row per record | ||||||
| - A record is a single date-time and value pair; ``quality-code`` and ``data-entry-date`` may be included as optional columns. ``version-date`` is also an attribute of the record but is covered under the optional metadata comments. | ||||||
| * - Single TS per payload | ||||||
| - Do not mix multiple time-series IDs in one payload | ||||||
| - Ensures a payload represents exactly one time-series. | ||||||
|
|
||||||
| Example CSVs | ||||||
| ~~~~~~~~~~~~ | ||||||
|
|
||||||
| 1. All optionals turned off, and no metadata comments: | ||||||
|
|
||||||
| .. code-block:: text | ||||||
|
|
||||||
| date-time, value (cfs) | ||||||
| 2021-06-21T00:00:00Z, 0.0 | ||||||
| 2021-06-22T00:00:00Z, 1.0 | ||||||
| 2021-06-23T00:00:00Z, 2.0 | ||||||
| 2021-06-24T00:00:00Z, 3.0 | ||||||
|
|
||||||
| 2. All optionals turned on, with metadata-as-comments turned on: | ||||||
|
|
||||||
| .. code-block:: text | ||||||
|
|
||||||
| # time-series-id: ALAT2.Flow-Out.Inst.1Hour.0.Rev-SWF-REGI | ||||||
| # office-id: SWT | ||||||
| # version-date: aggregate | ||||||
| date-time, value (cfs) | ||||||
| 2021-06-21T00:00:00Z, 0.0 | ||||||
| 2021-06-22T00:00:00Z, 1.0 | ||||||
| 2021-06-23T00:00:00Z, 2.0 | ||||||
| 2021-06-24T00:00:00Z, 3.0 | ||||||
|
|
||||||
| 3. All optionals turned on, with metadata-as-comments not turned on: | ||||||
|
|
||||||
| .. code-block:: text | ||||||
|
|
||||||
| time-series-id, office-id, date-time, value (cfs), version-date, data-entry-date, quality-code | ||||||
| ALAT2.Flow-Out.Inst.1Hour.0.Rev-SWF-REGI, SWT, 2021-06-21T00:00:00Z, 0.0, aggregate, 2021-06-21T00:05:00Z, 5 | ||||||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If you're proposing to add version-date to a column, the representation is then no longer
Collaborator
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. good point - this makes me more inclined to not have metadata columns at all. comments only. |
||||||
| ALAT2.Flow-Out.Inst.1Hour.0.Rev-SWF-REGI, SWT, 2021-06-22T00:00:00Z, 1.0, aggregate, 2021-06-22T00:05:00Z, 5 | ||||||
| ALAT2.Flow-Out.Inst.1Hour.0.Rev-SWF-REGI, SWT, 2021-06-23T00:00:00Z, 2.0, aggregate, 2021-06-23T00:05:00Z, 5 | ||||||
| ALAT2.Flow-Out.Inst.1Hour.0.Rev-SWF-REGI, SWT, 2021-06-24T00:00:00Z, 3.0, aggregate, 2021-06-24T00:05:00Z, 5 | ||||||
|
|
||||||
| Decision Status | ||||||
| =============== | ||||||
|
|
||||||
| (Status: proposed) | ||||||
|
|
||||||
|
|
||||||
| References | ||||||
| ========== | ||||||
|
|
||||||
| Related Types: cwms.cda.data.dto.TimeSeries, TimeSeries.Record | ||||||
| Issue/Discussion: https://github.com/USACE/cwms-data-api/issues/1525 | ||||||
Uh oh!
There was an error while loading. Please reload this page.