Implement core correlation change alert pipeline and validation by VenuraPallawela · Pull Request #79 · DataBytes-Organisation/Intelligent-IoT-Data-Management

VenuraPallawela · 2026-04-23T07:23:23Z

Overview

This pull request adds the initial implementation of the correlation change alert pipeline for the correlation_alert module. The work focuses on the core data pipeline for handling time-series sensor data, computing rolling correlations, comparing correlation changes between consecutive windows, and generating threshold-based alerts.

What was implemented

Core pipeline in `correlation_alert/main.py`

Implemented the following functions as part of the end-to-end correlation change alert workflow:

preprocess_timeseries(...)
- validates required columns
- selects the timestamp column and chosen streams
- converts timestamps into datetime format
- sorts by time and sets timestamp as the index
- converts selected streams to numeric
- handles missing values using interpolation and fill methods
create_rolling_windows(...)
- creates rolling windows from the cleaned time-series data
- supports configurable window_size and step_size
compute_window_correlations(...)
- computes Pearson correlation matrices for each window
- stores window metadata such as index, start time, and end time
compare_correlation_changes(...)
- compares correlations between consecutive windows
- calculates correlation change using Δr = |r_current - r_previous|
- stores structured comparison results for each unique stream pair
generate_alerts(...)
- generates alerts when delta_r is above the configured threshold
- assigns severity levels (LOW, MEDIUM, HIGH) based on the change magnitude
detect_correlation_change_alert(...)
- wrapper function that runs the full pipeline end to end
- returns processed data, windows, correlation results, change results, and alerts

Testing completed

Function-level testing

Each stage of the pipeline was tested separately before integrating everything into the final wrapper:

test_preprocess.py
- verified timestamp conversion, sorting, indexing, and missing value handling
test_windows.py
- verified rolling window creation with the expected overlap behaviour
test_correlations.py
- verified Pearson correlation matrix generation for each window
test_compare_changes.py
- verified comparison of consecutive window correlations and correct calculation of delta_r
test_alerts.py
- verified threshold-based alert generation and severity assignment
test_full_pipeline.py
- verified that the full wrapper function correctly connects all stages of the pipeline

Dataset-level validation

After function-level testing, the implementation was validated using three datasets:

simple.csv
- used as the first validation dataset to confirm the pipeline behaves correctly on a simpler time-series structure
complex.csv
- used to confirm that the same workflow handles more variation in correlation behaviour and produces meaningful change results and alerts
2881821.csv
- used as the real IoT-style dataset for validation
- confirmed that the pipeline works with real timestamp strings and multiple sensor fields (field1 to field8)
- produced rolling windows, correlation comparisons, and alert results successfully

Additional improvements made

cleaned and rounded output values to improve readability
fixed timestamp parsing so the pipeline supports both synthetic and real dataset formats
structured testing files under correlation_alert/venura_testing
added output generation so validation results can be saved instead of relying only on console output

Validation outputs

Created dataset runner scripts to save outputs into correlation_alert/venura_testing/outputs/ for:

processed data
change results
alerts
summary files

This was done for:

run_simple_dataset.py
run_complex_dataset.py
run_real_dataset.py

Current status

The core correlation change alert pipeline is now working end to end and has been validated on both test and dataset-based runs. This provides the foundation for further integration, Jupyter-based validation, and future connection with the alerting and wider project workflow.

…elation pipeline

…th outputs

senuradp

Great work on this PR. The overview is clear and the implementation provides a strong running version of the correlation change alert pipeline.

I reviewed the structure and the pipeline now connects the main stages end to end:
preprocessing → rolling windows → correlation computation → change comparison → alert generation.

A strong point of this PR is that each function has been tested separately before being connected through the wrapper. The separate tests for preprocessing, windowing, correlation computation, change comparison, alert generation, and the full pipeline make the implementation easier to validate and debug. The dataset-level validation using simple.csv, complex.csv, and 2881821.csv also provides useful evidence that the pipeline works across different data structures.

Before final merge, please make a few minor alignment fixes:

Keep the original wrapper signature for compatibility:
method="pearson", strong_corr_threshold=0.7, weak_corr_threshold=0.4, and delta_threshold=0.3.
In compute_window_correlations(), use the method parameter instead of hardcoding Pearson:
window_df.corr(method=method)
Standardise naming across the pipeline. The current use of change_results, delta_r, and stream_pair is clear, but other modules and tests should follow the same naming convention.
Update the generate_alerts() docstring so it matches the current function parameters and output.

Overall, this is a strong integration contribution and provides a good base for the final pipeline.

VenuraPallawela · 2026-05-07T15:11:53Z

Thanks for the review and feedback. I’ve updated the wrapper signature for compatibility, parameterised the correlation method, aligned the threshold handling, and updated the docstrings and naming consistency across the pipeline. The latest commit has now been pushed to the same branch.

senuradp · 2026-05-13T11:41:29Z

Thanks for updating the full pipeline implementation. I reviewed the updated wrapper flow, and the overall sequence is correct:
preprocessing → rolling windows → correlation computation → change comparison → alert generation.

This is useful as a full-pipeline reference and confirms the direction of the integrated system.

For final integration, I will keep the main wrapper structure aligned with the module-by-module implementation already integrated from the team contributions. One thing to note is that your implementation uses naming such as delta_r, stream_pair, previous_correlation, and severity, while the current integrated wrapper uses delta, stream_1, stream_2, previous_corr, current_corr, and alert_level.

To avoid breaking the integrated pipeline, these naming differences need to be standardised before merging directly. The severity threshold logic also needs to remain aligned with our final agreed ranges:
No Alert: Δr < 0.3
LOW: 0.3 ≤ Δr < 0.5
MEDIUM: 0.5 ≤ Δr < 0.7
HIGH: Δr ≥ 0.7

Overall, this is a strong full-pipeline contribution and will be used as a reference for final integration and testing.

VenuraPallawela added 7 commits April 21, 2026 10:34

Implement preprocess_timeseries, rolling window segmentation for corr…

f32f17c

…elation pipeline

Add per-window Pearson correlation computation

9da7d4c

Add correlation change comparison across consecutive windows

d7f2a63

Add threshold-based alert generation for correlation changes

83f0713

Wire full correlation change alert pipeline

cec483a

Add structured dataset testing with saved outputs

31f44c0

Implement correlation change alert pipeline and dataset validation wi…

bbea6a2

…th outputs

senuradp requested changes Apr 25, 2026

View reviewed changes

Refactor pipeline parameters for compatibility and consistency

29d15f1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement core correlation change alert pipeline and validation#79

Implement core correlation change alert pipeline and validation#79
VenuraPallawela wants to merge 8 commits into
DataBytes-Organisation:feature/correlation-alert/venura-corefrom
VenuraPallawela:feature/correlation-alert/venura-core

VenuraPallawela commented Apr 23, 2026

Uh oh!

senuradp left a comment

Uh oh!

VenuraPallawela commented May 7, 2026

Uh oh!

senuradp commented May 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

VenuraPallawela commented Apr 23, 2026

Overview

What was implemented

Core pipeline in correlation_alert/main.py

Testing completed

Function-level testing

Dataset-level validation

Additional improvements made

Validation outputs

Current status

Uh oh!

senuradp left a comment

Choose a reason for hiding this comment

Uh oh!

VenuraPallawela commented May 7, 2026

Uh oh!

senuradp commented May 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Core pipeline in `correlation_alert/main.py`