Skip to content

BDMS 412/529/530: water level csv implementation#466

Open
jacob-a-brown wants to merge 36 commits intowater-level-csvfrom
water-level-csv-refactor
Open

BDMS 412/529/530: water level csv implementation#466
jacob-a-brown wants to merge 36 commits intowater-level-csvfrom
water-level-csv-refactor

Conversation

@jacob-a-brown
Copy link
Contributor

@jacob-a-brown jacob-a-brown commented Feb 5, 2026

Why

This PR addresses the following problem / context:

  • The water level csv upload needs to be implemented correctly per the feature tests and DB design
  • Common steps between water-level-csv.feature and well-inventory-csv.feature need to be resolved so that the same step functions can be reused

How

Implementation summary - the following was changed / added / removed:

Implementation

  • The water level csv upload implementation was refactored to leverage Pydantic for data validations and code reuse
  • The different fields in the water level csv file were mapped to the correct tables and fields in Ocotillo

Tests

  • The common steps between water-level-csv.feature and well-inventory-csv.feature were moved to tests/features/common.py so that they can be reused for both feature files.
    • This also meant renaming context.csv_rows to context.rows in water-level-csv.feature for the style and implementation to be the same between the two feature files
  • A unit test was added to tests/services/water_level_csv.py to ensure that the function used to add bulk data to the DB works and persists data as expected
  • The cli tests were updated for the updated implementation

Notes

Any special considerations, workarounds, or follow-up work to note?

  • Once this PR has been accepted the final thing to do is implement optional water levels for the well inventory csv upload. This will be achievable by reusing the function services/water_level_csv.py::bulk_upload_water_levels in the well inventory csv upload. That function requires the wells to already exist in the DB, so it should be called after the wells have been persisted. Some things to consider as this final functionality is implemented/developed:
    • Even if thebulk_upload_water_levels function cannot be used for well inventory csv the Pydantic schemas can and should be used.
    • If a well is uploaded correctly, but there is an error in its associated (optional) water level, should that well still be uploaded? What will the error message(s) look like?
    • If everything is successfully uploaded, what will the stdout look like?
    • The step test the optional "water_level_date_time" values are valid ISO 8601 timezone-naive datetime strings (e.g. "2025-02-15T10:30:00") when provided in tests/features/well-inventory-csv.py needs to be uncommented when this is implemented.
    • The well inventory unit tests need to be updated to ensure that the optional water level upload is working as expected. Correct data validation, correct data persistence, etc...
  • This uses some functionality from well-inventory-csv that has been vetted and approved. PR 🚨 ⚠️ 💀 [NO TICKET]: Well inventory csv Feature Tests #247 should get merged into staging before water-level-csv gets merged into staging (note: this is to merge into water-level-csv, which also has the latest work from well-inventory-csv)

jacob-a-brown and others added 30 commits December 22, 2025 11:38
This was deprecated awhile ago but still existed in the lexicon and parameter files.
The only valid sample matrix for water level observations is "water"
The bulk upload is used for both the API and CLI, so it should
be tested by itself. Then the API and CLI tests can just call the bulk upload
function and verify that they run as expected
All water levels must be associated with an existing contact in the DB,
so the contact fixture is used instead of hardcoding a name.
hanging data prevents tests from running cleanly
This large refactor updates the water level CSV bulk upload process in
the following ways:

- It uses the updated schema/headers for the CSV
- It uses Pydantic to do all of the data validations that don't pertain to the DB
- It updates tests to reflect these changes
- It tests the services function since that logic is used by both the API and CLI
The logic to test if the bulk upload function works correctly is already covered in its own unit tests. The CLI test was redundant and has been removed to streamline the test suite.
This commit adds a unit test to verify that the water level csv upload
service correctly handles the scenario where the specified source file
does not exist. The test checks that the appropriate error message is
returned in the payload when a FileNotFoundError is raised.
instead of just the first one. This allows users to see all issues with a row at once instead of having to fix them one by one.
the model needs to be validated and created before any comparisons
can be made against the database
…n value

This reflects the updates to the feature test
It is now called water_level_date_time instead of measurement_date_time
The required/optional givens are common between well inventory
and water level CSVs, so this commit moves them to common.py to avoid
duplication and make them reusable across different feature files.
The previous implementation was for an outdated version of the water level
csv handling. This commit updates the test to align with the new
implementation, ensuring that the CLI commands are tested correctly with
the current water level csv processing logic.
Both water-level-csv and well-inventory-csv features have the same step
to convert datetime fields from naive to aware and confirm the correct
timezone offset. This commit moves that step implementation to common.py
to avoid code duplication and ensure consistent testing of datetime
handling across both features.
This commit adds a feature test for invalid "measuring_person" values in
the water level csv upload. It also uses standard verbiage for the feature
test so that the same step can be reused.
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR implements the water level CSV upload functionality with proper Pydantic-based validation and database persistence, while refactoring common test steps to enable code reuse between water-level-csv and well-inventory-csv feature files.

Changes:

  • Refactored water level CSV upload to use Pydantic schemas for validation and structured data handling
  • Moved common test steps to tests/features/common.py for reuse across feature files
  • Updated field mappings to align with database schema (e.g., measurement_date_timewater_level_date_time, field staff handling)

Reviewed changes

Copilot reviewed 16 out of 16 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
services/water_level_csv.py Core implementation refactored to use Pydantic schemas and proper database field mappings
schemas/water_level_csv.py New Pydantic schema definitions for CSV rows and bulk upload responses
tests/services/test_water_level_csv.py New comprehensive unit tests for bulk upload function
tests/test_observation.py Updated API test to use fixture-based test data and verify all created database entities
tests/test_cli_commands.py Updated CLI tests with detailed verification of database insertions
tests/features/steps/common.py New common test steps for CSV field validation and datetime handling
tests/features/steps/water-levels-csv.py Updated to use Pydantic schema and shared common steps
tests/features/steps/well-inventory-csv.py Removed duplicate steps now available in common.py
tests/conftest.py New fixture providing test data for water level bulk uploads
tests/features/environment.py Updated contact creation to support multiple named contacts
api/observation.py Updated to return proper Pydantic payload and 201 status code
core/parameter.json Corrected matrix value from "groundwater" to "water"

Comment on lines +1183 to +1201
@pytest.fixture()
def water_level_bulk_upload_data(water_well_thing, contact, second_contact):
data = {
"well_name_point_id": water_well_thing.name,
"field_event_date_time": "2025-02-15T08:00:00",
"field_staff": contact.name,
"field_staff_2": second_contact.name,
"field_staff_3": "",
"water_level_date_time": "2025-02-15T10:30:00",
"measuring_person": contact.name,
"sample_method": "Electric tape measurement (E-probe)",
"mp_height": "1.5",
"level_status": "Water level not affected",
"depth_to_water_ft": "7",
"data_quality": "Water level accurate to within two hundreths of a foot",
"water_level_notes": "Initial measurement",
}
yield data
del data
Copy link

Copilot AI Feb 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Corrected spelling of 'hundreths' to 'hundredths' in data_quality field.

Copilot uses AI. Check for mistakes.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jirhiker and @ksmuczynski

This typo exists in LU_DataQuality. Should it be fixed here or at a late time?

@jacob-a-brown
Copy link
Contributor Author

@codex review

@chatgpt-codex-connector
Copy link

To use Codex here, create a Codex account and connect to github.

@jacob-a-brown
Copy link
Contributor Author

@codex review

@chatgpt-codex-connector
Copy link

To use Codex here, create a Codex account and connect to github.

response_model=WaterLevelBulkUploadResponse,
status_code=HTTP_200_OK,
response_model=WaterLevelBulkUploadPayload,
status_code=HTTP_201_CREATED,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😄

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The naming can change as we see fit 😅 . I tried to mimic the nomenclature from when I refactored/updated the water level csv implementation. I think originally we were planning to have it work as an API endpoint but now it's being implemented by the CLI, whose "response" we need to define manually.

context.datetime_fields.append("date_time")


# TODO: implement when optional water levels are added to the well inventory csv testing data
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this test here before it's implemented? It's kinda silly to have a test with just pass.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question! Since it is in the feature file if I don't implement the step then a warning/error will be flagged and the tests will not pass.

As I wrote at the end of the PR description this is the next and final step in getting the csv uploads to be fully implemented

field_staff,well_name_point_id,field_event_date_time,measurement_date_time,sampler,sample_method,mp_height,level_status,depth_to_water_ft,data_quality,water_level_notes
Alice Lopez,AR0001,2025-02-15T08:00:00-07:00,2025-02-15T10:30:00-07:00,Groundwater Team,electric tape,1.5,stable,45.2,approved,Initial measurement after irrigation shutdown
Bernardo Chen,AR0002,2025-03-05T09:15:00-07:00,2025-03-05T11:10:00-07:00,Consultant,steel tape,1.8,rising,47.0,provisional,Follow-up visit; pump was off for 24h
Alice Lopez,AR0001,2025-02-15T08:00:00,2025-02-15T10:30:00,Groundwater Team,electric tape,1.5,stable,45.2,approved,Initial measurement after irrigation shutdown
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, the CSVs that we use cannot have the UTC offset, or this feature will not work?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The intension is for users to submit timezone-naive and then we add the offset manually, either MST or MDT (assuming that the data is submitted in Mountain Time)

  1. And "water_level_date_time" values are valid ISO 8601 timezone-naive datetime strings (e.g. "2025-02-15T10:30:00")
  2. Then all datetime objects are assigned the correct Mountain Time timezone offset based on the date value.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants