NASA_VIIRSActiveFiresEvents_Fixes_corrected config path as well as added skip_validation_config#2013
Conversation
There was a problem hiding this comment.
Code Review
This pull request modifies the GCS output directory path in the fire events pipeline configuration and updates the manifest to include a configuration override that disables several tools, including the import tool, while also removing the user script timeout. Feedback suggests that disabling the import tool prevents data from being imported and that removing the script timeout could lead to premature job termination. It is recommended to restore the timeout and enable the import tool to ensure the pipeline functions as intended.
|
@gemini-code-assist review |
There was a problem hiding this comment.
Code Review
This pull request updates the fire events pipeline configuration by modifying the GCS output directory and adds a config_override section to the manifest while removing a timeout. Feedback indicates that the config_override field is not yet supported by the ImportExecutor and that enabling skip_gcs_upload would disrupt version tracking. Furthermore, the change to the output directory path may incorrectly remove the 'events' subfolder, potentially violating directory structure guidelines.
|
@gemini-code-assist review |
There was a problem hiding this comment.
Code Review
This pull request modifies the GCS output directory path and updates the manifest configuration by removing the script timeout and adding several tool overrides. Feedback suggests that removing the user_script_timeout could cause long-running jobs to fail and that disabling the import tool and GCS upload might be unintended. Additionally, the change to the output directory path removes the stage subfolder entirely, which may conflict with the goal of maintaining a specific folder structure.
| "cron_schedule": "0 5 1 * *", | ||
| "config_override": { | ||
| "skip_gcs_upload": true, | ||
| "invoke_differ_tool": false, | ||
| "invoke_import_validation": false, | ||
| "invoke_import_tool": false | ||
| } |
There was a problem hiding this comment.
The changes in the manifest appear to disable the automated import and remove safety timeouts:
- Timeout Removal:
user_script_timeoutwas removed. Given that this pipeline processes global fire events for a full year, it likely requires a significant amount of time. Removing this may cause the automation to kill the job prematurely. - Import Disabled: Setting
"invoke_import_tool": falseand"skip_gcs_upload": trueprevents the generated data from being loaded into the database. The PR description only mentions skipping validation and the differ.
If the goal is to fix the production import, these settings should likely be adjusted to only skip the intended steps.
"user_script_timeout": 86400,
"cron_schedule": "0 5 1 * *",
"config_override": {
"invoke_differ_tool": false,
"invoke_import_validation": false
}| # as the import automation can copy all files in a single folder. | ||
| "output_dir": | ||
| "gs://{gcs_bucket}/{gcs_folder}/{stage}/{import_name}-{stage}-{year}-without-usa-", | ||
| "gs://{gcs_bucket}/{gcs_folder}/{import_name}-{stage}-{year}-without-usa-", |
There was a problem hiding this comment.
The PR description mentions a goal to "avoid 'events/events' folder and have just 'events' folder". However, the current change removes the {stage}/ directory (which resolves to events/) from the path entirely, while keeping the {stage} suffix in the filename prefix. This results in the output being placed in the parent directory (scripts/fires/firms/) with no events/ subfolder.
If the intention was to have a single events/ folder and remove the redundancy from the filename, consider keeping {stage}/ in the path but removing {stage}- from the filename prefix.
| "gs://{gcs_bucket}/{gcs_folder}/{import_name}-{stage}-{year}-without-usa-", | |
| "gs://{gcs_bucket}/{gcs_folder}/{stage}/{import_name}-{year}-without-usa-", |
As discussed in the https://b.corp.google.com/issues/498452602, following changes have been done:
The changes have been validated in the test environment
https://pantheon.corp.google.com/batch/jobsDetail/regions/us-central1/jobs/nasa-viirsactivefiresevents-balit-20260515-131022/logs?e=13803378&mods=-monitoring_api_staging&project=datcom-infosys-dev