Add preprocessing pipeline intergration for complex dataset#72
Conversation
Signed-off-by: minhnhu123 <minhnhu171202@gmail.com>
senuradp
left a comment
There was a problem hiding this comment.
Good work on the preprocessing implementation. I reviewed the code locally, and the preprocessing logic is relevant for the correlation alert pipeline. It correctly selects required columns, converts values to numeric, sorts by timestamp, removes duplicate timestamps, sets the timestamp as the index, and handles missing values.
Before merging, please keep this PR focused on preprocessing only. The changes in main.py currently implement the full pipeline, which overlaps with other team members’ assigned components and may create integration conflicts.
For final integration, please ensure this contribution plugs into the wrapper function:
preprocess_timeseries(df, timestamp_col, selected_streams)
Also, any hardcoded dataset paths such as datasets/complex.csv should be limited to testing/demo files and not required for the main reusable pipeline.
|
Thank you for you feedback, I have updated main.py as requested: Additional changes made
|
|
Thanks for updating the PR based on the earlier feedback. I reviewed the updated implementation, and this version is much better aligned with the intended modular pipeline structure. The There is a minor import/path issue related to how Overall, this is a solid preprocessing implementation and is ready to proceed. |
Overview
This pull request implements the data preprocessing module and refactors the correlation_alert/main.py to integrate with the preprocessing pipeline.
The work focuses on preparing clean and consistent time-series sensor data and ensuring the correlation alert pipeline uses modular and reusable preprocessing logic.
What was implemented
Preprocessing pipeline in preprocessing.py
Implemented a reusable preprocessing pipeline with the following steps:
Testing completed
Function-level testing
Verified preprocessing steps:
Verified integration with correlation pipeline:
Dataset-level validation
Tested using:
complex.csv
Confirmed:
The preprocessing module is complete and successfully integrated with the correlation alert pipeline.