Skip to content

Implement core correlation change alert pipeline and validation#79

Open
VenuraPallawela wants to merge 8 commits into
DataBytes-Organisation:feature/correlation-alert/venura-corefrom
VenuraPallawela:feature/correlation-alert/venura-core
Open

Implement core correlation change alert pipeline and validation#79
VenuraPallawela wants to merge 8 commits into
DataBytes-Organisation:feature/correlation-alert/venura-corefrom
VenuraPallawela:feature/correlation-alert/venura-core

Conversation

@VenuraPallawela
Copy link
Copy Markdown

Overview

This pull request adds the initial implementation of the correlation change alert pipeline for the correlation_alert module. The work focuses on the core data pipeline for handling time-series sensor data, computing rolling correlations, comparing correlation changes between consecutive windows, and generating threshold-based alerts.

What was implemented

Core pipeline in correlation_alert/main.py

Implemented the following functions as part of the end-to-end correlation change alert workflow:

  • preprocess_timeseries(...)

    • validates required columns
    • selects the timestamp column and chosen streams
    • converts timestamps into datetime format
    • sorts by time and sets timestamp as the index
    • converts selected streams to numeric
    • handles missing values using interpolation and fill methods
  • create_rolling_windows(...)

    • creates rolling windows from the cleaned time-series data
    • supports configurable window_size and step_size
  • compute_window_correlations(...)

    • computes Pearson correlation matrices for each window
    • stores window metadata such as index, start time, and end time
  • compare_correlation_changes(...)

    • compares correlations between consecutive windows
    • calculates correlation change using Δr = |r_current - r_previous|
    • stores structured comparison results for each unique stream pair
  • generate_alerts(...)

    • generates alerts when delta_r is above the configured threshold
    • assigns severity levels (LOW, MEDIUM, HIGH) based on the change magnitude
  • detect_correlation_change_alert(...)

    • wrapper function that runs the full pipeline end to end
    • returns processed data, windows, correlation results, change results, and alerts

Testing completed

Function-level testing

Each stage of the pipeline was tested separately before integrating everything into the final wrapper:

  • test_preprocess.py

    • verified timestamp conversion, sorting, indexing, and missing value handling
  • test_windows.py

    • verified rolling window creation with the expected overlap behaviour
  • test_correlations.py

    • verified Pearson correlation matrix generation for each window
  • test_compare_changes.py

    • verified comparison of consecutive window correlations and correct calculation of delta_r
  • test_alerts.py

    • verified threshold-based alert generation and severity assignment
  • test_full_pipeline.py

    • verified that the full wrapper function correctly connects all stages of the pipeline

Dataset-level validation

After function-level testing, the implementation was validated using three datasets:

  • simple.csv

    • used as the first validation dataset to confirm the pipeline behaves correctly on a simpler time-series structure
  • complex.csv

    • used to confirm that the same workflow handles more variation in correlation behaviour and produces meaningful change results and alerts
  • 2881821.csv

    • used as the real IoT-style dataset for validation
    • confirmed that the pipeline works with real timestamp strings and multiple sensor fields (field1 to field8)
    • produced rolling windows, correlation comparisons, and alert results successfully

Additional improvements made

  • cleaned and rounded output values to improve readability
  • fixed timestamp parsing so the pipeline supports both synthetic and real dataset formats
  • structured testing files under correlation_alert/venura_testing
  • added output generation so validation results can be saved instead of relying only on console output

Validation outputs

Created dataset runner scripts to save outputs into correlation_alert/venura_testing/outputs/ for:

  • processed data
  • change results
  • alerts
  • summary files

This was done for:

  • run_simple_dataset.py
  • run_complex_dataset.py
  • run_real_dataset.py

Current status

The core correlation change alert pipeline is now working end to end and has been validated on both test and dataset-based runs. This provides the foundation for further integration, Jupyter-based validation, and future connection with the alerting and wider project workflow.

Copy link
Copy Markdown
Collaborator

@senuradp senuradp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work on this PR. The overview is clear and the implementation provides a strong running version of the correlation change alert pipeline.

I reviewed the structure and the pipeline now connects the main stages end to end:
preprocessing → rolling windows → correlation computation → change comparison → alert generation.

A strong point of this PR is that each function has been tested separately before being connected through the wrapper. The separate tests for preprocessing, windowing, correlation computation, change comparison, alert generation, and the full pipeline make the implementation easier to validate and debug. The dataset-level validation using simple.csv, complex.csv, and 2881821.csv also provides useful evidence that the pipeline works across different data structures.

Before final merge, please make a few minor alignment fixes:

  1. Keep the original wrapper signature for compatibility:
    method="pearson", strong_corr_threshold=0.7, weak_corr_threshold=0.4, and delta_threshold=0.3.

  2. In compute_window_correlations(), use the method parameter instead of hardcoding Pearson:
    window_df.corr(method=method)

  3. Standardise naming across the pipeline. The current use of change_results, delta_r, and stream_pair is clear, but other modules and tests should follow the same naming convention.

  4. Update the generate_alerts() docstring so it matches the current function parameters and output.

Overall, this is a strong integration contribution and provides a good base for the final pipeline.

@VenuraPallawela
Copy link
Copy Markdown
Author

Thanks for the review and feedback. I’ve updated the wrapper signature for compatibility, parameterised the correlation method, aligned the threshold handling, and updated the docstrings and naming consistency across the pipeline. The latest commit has now been pushed to the same branch.

@senuradp
Copy link
Copy Markdown
Collaborator

Thanks for updating the full pipeline implementation. I reviewed the updated wrapper flow, and the overall sequence is correct:
preprocessing → rolling windows → correlation computation → change comparison → alert generation.

This is useful as a full-pipeline reference and confirms the direction of the integrated system.

For final integration, I will keep the main wrapper structure aligned with the module-by-module implementation already integrated from the team contributions. One thing to note is that your implementation uses naming such as delta_r, stream_pair, previous_correlation, and severity, while the current integrated wrapper uses delta, stream_1, stream_2, previous_corr, current_corr, and alert_level.

To avoid breaking the integrated pipeline, these naming differences need to be standardised before merging directly. The severity threshold logic also needs to remain aligned with our final agreed ranges:
No Alert: Δr < 0.3
LOW: 0.3 ≤ Δr < 0.5
MEDIUM: 0.5 ≤ Δr < 0.7
HIGH: Δr ≥ 0.7

Overall, this is a strong full-pipeline contribution and will be used as a reference for final integration and testing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants