Skip to content

ZScoreDetector: fit_score returns None values #17

@stefanDeveloper

Description

@stefanDeveloper

Describe the bug
Fitting the example data [0. 0. 0. 0. 0. 1. 0. 1. 4. 0.] for the ZScoreDetector returns None values, however, I would expect -1 values or something else when the windows length is not reached. Any reason to return None?

To Reproduce
Steps to reproduce the behavior:

  1. Run example code
  2. See score results:
    [None, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]

Example

from streamad.model import ZScoreDetector
from streamad.util import StreamGenerator, CustomDS
import numpy as np

data = {
    "start": "2024-07-02T12:52:45.000Z",
    "end": "2024-07-02T12:52:55.000Z",
    "data": [
        {
            "timestamp": "2024-07-02T12:52:50.988Z",
        },
        {
            "timestamp": "2024-07-02T12:52:52.092Z",
        },
        {
            "timestamp": "2024-07-02T12:52:53.095Z",
        },
        {
            "timestamp": "2024-07-02T12:52:53.095Z",
        },
        {
            "timestamp": "2024-07-02T12:52:53.095Z",
        },
        {
            "timestamp": "2024-07-02T12:52:53.095Z",
        },
    ],
}

# Convert timestamps to numpy datetime64
timestamps = np.array([
    np.datetime64(item["timestamp"])
    for item in data["data"]
])

# Sort timestamps and count occurrences
sorted_indices = np.argsort(timestamps)
timestamps = timestamps[sorted_indices]

# Set min_date and max_date
min_date = np.datetime64(data["start"])
max_date = np.datetime64(data["end"])

# Generate the time range from min_date to max_date with 1ms interval
time_range = np.arange(min_date, max_date, np.timedelta64(1, 's'))

# Initialize an array to hold counts for each timestamp in the range
counts = np.zeros(time_range.shape, dtype=np.float64)

# Count occurrences of timestamps and fill the corresponding index in the counts array
unique_times, unique_indices, unique_counts = np.unique(timestamps, return_index=True, return_counts=True)
time_indices = ((unique_times - min_date)//1).astype('timedelta64[s]').astype(int)
counts[time_indices] = unique_counts

# Reshape into the required shape (n, 1) and print the resulting numpy array
X = counts.reshape(-1, 1).astype(np.float64)

ds = CustomDS(X, X)
stream = StreamGenerator(ds.data)
model = ZScoreDetector(window_len=1)

scores = []

for x in stream.iter_item():
    score = model.fit_score(x)
    scores.append(score)
    
print(scores)

Desktop (please complete the following information):

  • OS: Linux 6.5.0-44-generic #44~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Tue Jun 18 14:36:16 UTC 2 x86_64 x86_64 x86_64 GNU/Linux

Package version (please complete the following information):

  • Version 0.3.1

Metadata

Metadata

Assignees

Labels

bugSomething isn't workingenhancementNew feature or request

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions