Skip to content

Latest commit

 

History

History
343 lines (291 loc) · 14 KB

File metadata and controls

343 lines (291 loc) · 14 KB

proj-adl-classification

statistical_feature.py

Using GPU to compute statistical features based on PyTorch.

Also compare the results with the features computed by CPU (Numpy).

The return is a pd dataframe with columns: 'feature name', 'feature value gpu', 'feature value cpu', and 'time consumption'.

X - Time series

"": No reference
"
**": More than one reference and one is questionable
"~": Further research required on feature

Statistical Features

Number Feature Description Info
1 calculate_harmonic_mean_abs(X) Calculates the harmonic mean of the absolute values of X *
2 calculate_trimmed_mean_abs(X) Calculates the trimmed mean of absolute values of X *
3 calculate_std_abs(X) Calculates the standard deviation of the absolute values of X *
4 calculate_skewness_abs(X) Calculate skewness of absolute values of X *
5 calculate_kurtosis_abs(X) Calculates the kurtosis of the absolute values of X *
6 calculate_median_abs Calculates the median of the absolute values of X *
7 calculate_min_abs(X) Calculates the minimum value of the absolute values of X *
8 calculate_range_abs(X) Calculates the range of the absolute values of X *
9 calculate_variance_abs(X) Calculates the variance of the absolute values of X *
10 calculate_mean_absolute_deviation(X) Calculates the mean of the absolute deviation of X ~
11 calculate_signal_magnitude_area(X) Calculates the magnitude area of X. The sum of the absolute values of X ~
12 calculate_cardinality(X) ~
13 calculate_rms_to_mean_abs(X) Computes the ratio of the RMS value to mean absolute value of X *
14 calculate_area_under_squared_curve(X) Computed the area under the curve of X squared *
15 calculate_exponential_moving_average(X, param) Calculates the exponential moving average of X *
16 calculate_fisher_information(X) Computes the Fisher information of X ~
17 calculate_local_maxima_and_minima(X) Calculates the local maxima and minima of X *
18 calculate_log_return(X) Returns the logarithm of the ratio between the last and first values of which is a measure of the percentage change in X ~
19 calculate_lower_complete_moment(X) *
20 calculate_mean_second_derivative_central(X) Returns the mean of the second derivative of X
21 calculate_median_second_derivative_central(X) Calculates the median of the second derivative of X *
23 calculate_ratio_of_fluctuations(X) Computes the ratio of positive and negative fluctuations in X *
24 calculate_ratio_value_number_to_sequence_length(X) Returns the ratio of length of a set of X to the length X *
25 calculate_second_order_difference(X) Returns the second differential of X **
26 calculate_signal_resultant(X) *
27 calculate_sum_of_negative_values(X) Calculates the sum of negative values in X *
28 calculate_sum_of_positive_values(X) Returns the sum of positive values in X *
29 calculate_variance_of_absolute_differences(X) Returns variance of the absolute of the first order difference of X
30 calculate_weighted_moving_average(X) Returns the weighted moving average of X *
31 calculate_covariance ~


Statistical Features - NEW!!

Number Feature Reference
1. calculate_mean_to_variance


Time-Frequency Features

Number Feature Reference
1 extract_wavelet_features(params)
2 extract_spectrogram_features(params)
3 extract_stft_features(params)
4 teager_kaiser_energy_operator(X)


Spectral Features

Number Feature Reference
1 calculate_spectral_subdominant_valley *
2

NOT in tsfresh


Spectral Features

  1. Median frequency

  2. Spectral bandwidth

  3. Spectral absolute deviation

  4. Spectral slope linear

  5. Spectral slope logarithmic

  6. Spectral flatness

  7. Peak frequencies

  8. Spectral edge frequency

  9. Band power

  10. Spectral entropy

  11. Spectral contrast

  12. Spectral coefficient variation

  13. Spectral flux

  14. Spectral rolloff

  15. Harmonic ratio

  16. Fundamental frequency

  17. Spectral crest factor

  18. Spectral decrease

  19. Spectral irregularity

  20. Mean frequency

  21. Frequency winsorized mean

  22. Total harmonic distortion

  23. Inharmonicity

  24. Tristimulus

  25. Spectral rollon

  26. Spectral hole count

  27. Spectral autocorrelation

  28. Spectral variability

  29. Spectral spread ratio

  30. Spectral skewness ratio

  31. Spectral kurtosis ratio

  32. Spectral tonal power ratio

  33. Spectral noise to harmonics ratio

  34. Spectral even to odd harmonic energy ratio

  35. Spectral strongest frequency phase

  36. Spectral frequency below peak

  37. Spectral frequency above peak

  38. Spectral cumulative frequency

  39. Spectral cumulative frequency

  40. Spectral cumulative frequency above

  41. Spectral spread shift

  42. Spectral entropy shift

  43. Spectral change vector magnitude

  44. Spectral low frequency content

  45. Spectral mid frequency content

  46. Spectral peak-to-valley ratio

  47. Spectral valley depth mean

  48. Spectral valley depth std

  49. Spectral valley depth variance

  50. Spectral valley width mode

  51. Spectral valley width standard deviation

  52. Spectral subdominant valley

  53. Spectral valley count

  54. Spectral peak broadness

  55. Spectral valley broadness

  56. Frequency variance

  57. Frequency standard deviation

  58. Frequency Range

  59. Frequency Trimmed mean

  60. Harmonic product spectrum

  61. Smoothness

  62. Roughness


Time-Frequency Features

Statistical features from wavelets, spectrogram and short-time fourier transform

Statistical Features

  1. Hurst exponent from detrended fluctuation analysis
  2. Winsorized mean
  3. Weighted moving average
  4. Sum of positive values
  5. Sum of negative values
  6. Stochastic oscillator value
  7. Smoothing by binomial filter
  8. Signal-to-noise ratio
  9. Signal resultant
  10. Second order difference
  11. Ratio value number to sequence length
  12. Ratio beyond r signal
  13. Petrosian fractal dimension
  14. Percentage of positive values
  15. Percentage of negative values
  16. Pearson correlation coefficient
  17. Peak-to-peak distance
  18. Number of inflection points
  19. Moving average
  20. Mode
  21. Median second derivative central
  22. Mean relative change
  23. Mean crossings
  24. Lower complete moment
  25. Log return
  26. Katz fractal dimension
  27. Histogram bin frequencies
  28. Fisher information
  29. First quartile
  30. First order difference
  31. Exponential moving average
  32. Energy ratio by chunks
  33. Differential entropy
  34. Cumulative sum
  35. Covariance
  36. Count
  37. Area under curve
  38. Area under squared curve
  39. Renyi entropy
  40. Tsallis entropy
  41. Root mean squared to mean absolute
  42. Cardinality
  43. Hjorth mobility and complexity
  44. Singular value decomposition (SVD) entropy
  45. Higuchi fractal dimensions
  46. Slope sign change
  47. Average amplitude change
  48. Signal magnitude area*
  49. Median absolute deviation
  50. Coefficient of variation
  51. Higher order moments
  52. Mean auto correlation
  53. Impulse factor
  54. Shape factor
  55. Clearance factor
  56. Crest factor
  57. Zero crossings
  58. Entropy
  59. Log energy
  60. Mean absolute deviation
  61. Interquartile range
  62. Variance absolute
  63. Maximum absolute
  64. Minimum absolute
  65. Range absolute
  66. Range
  67. Median absolute
  68. Kurtosis absolute
  69. Skewness absolute
  70. Standard deviation absolute
  71. Trimmed mean absolute
  72. Trimmed mean
  73. Harmonic Mean
  74. Harmonic mean absolute
  75. Geometric mean
  76. Geometric mean absolute
  77. Mean absolute

Added features

Number Feature Description
1 Augmented dickey fuller test Perform the Augmented Dickey-Fuller (ADF) test to check for stationarity in a given time series signal.
2 Hurst exponent Calculate the Hurst Exponent of a given time series using Detrended Fluctuation Analysis (DFA).

Deleted features

Number Feature Reason
1 calculate_roll_mean Same implementation as calculate_moving_average
2 calculate_absolute_energy Same implementation as signal energy
3 calculate_cumulative_energy Produces same result as the absolute energy and signal energy. These three will always be the same for a given signal.
4 calculate_intercept_of_linear_fit This feature is returned again in the calculate_linear_trend_with_full_linear_regression_results function
5 calculate_pearson_correlation_coefficient Since this function calculates the Pearson correlation coefficient between the signal and its one-step lagged version, it is fundamentally calculating the autocorrelation of the signal. The autocorrelation is already present(calculate_mean_auto_correlation). Having both is redundant.
6 calculate_slope_of_linear_fit This is already calculated in calculate_linear_trend_with_full_linear_regression_results
7 calculate_frequency_std Same implementation as calculate_spectral_bandwidth with order set to 2
8 calculate_frequency_variance Same implementation as calculate_spectral_variance
9 calculate_mean_frequency(freqs, magnitudes) Same as calculate_spectral_centroid with order set to 1
10 calculate_first_quartile calculate_percentile(signal, percentiles=[25, 50, 75]) returns the first, second, and third quartiles
11 calculate_third_quartile calculate_percentile(signal, percentiles=[25, 50, 75]) returns the first, second, and third quartiles
14 calculate_spectral_entropy_shift Same implementation as calculate_spectral_entropy but with spectrum_magnitudes as argument and not psd
13 calculate_spectral_spread_shift Same spectral standard deviation
14 calculate_spectral_autocorrelatiion Autocorrelation of magnitudes is backed by literature

Features that should be deleted

Number Feature Type Reason
1 calculate_histogram_bins statistical
2 calculate_signal_magnitude_area statistical
3 calculate_spectral_hole_count spectral Spectral holes are typically of use in radio signals. Although the aim is to make this a very comprehensive toolbox, this feature is a little bit out of scope.

Features in Tsfresh but not in SCAI toolbox

Number Feature Description Added yet?
1 absolute sum of changes ✔️
2 ar_coefficient(x, param) This feature calculator fits the unconditional maximum likelihood of an autoregressive AR(k) process
3 benford correlation ✔️
4 c3 uses c3 statistics to measure non-linearity in the time series
5 count_above(x, t) Returns the percentage of values in x that are higher than t ✔️
6 count_below(x, t) Returns the percentage of values in x that are lower than t ✔️
7 cid_ce(x, normalize) This function calculator is an estimate for a time series complexity [1] (A more complex time series has more peaks, valleys etc.). ✔️
8 friedrich_coefficients(x, param) Coefficients of polynomial h(x), which has been fitted to the deterministic dynamics of Langevin model
9 has_duplicate(x) Checks if any value in x occurs more than once ✔️
10 has_duplicate_max(x) Checks if the maximum value of x is observed more than once ✔️
11 has_duplicate_min(x) Checks if the minimal value of x is observed more than once ✔️
12 index_mass_quantile(x, param) Calculates the relative index i of time series x where q% of the mass of x lies left of i.
13 mean_n_absolute_max(x, number_of_maxima) Calculates the arithmetic mean of the n absolute maximum values of the time series.
14 large_standard_deviation(x, r) Does time series have large standard deviation ✔️
15 lempel_ziv_complexity(x, bins) Calculate a complexity estimate based on the Lempel-Ziv compression algorithm. ✔️
16 matrix_profile(x, param) Calculates the 1-D Matrix Profile[1] and returns Tukey's Five Number Set plus the mean of that Matrix Profile.
17 max_langevin_fixed_point(x, r, m) Largest fixed point of dynamics :math:argmax_x {h(x)=0}` estimated from polynomial h(x), which has been fitted to the deterministic dynamics of Langevin model
18 binned entropy
19 symmetry looking Boolean variable denoting if the distribution of x looks symmetric.
20 change_quantiles First fixes a corridor given by the quantiles ql and qh of the distribution of x.
21 fft_coefficient Calculates the fourier coefficients of the one-dimensional discrete Fourier Transform for real input by fast fourier transformation algorithm
22 matrix_profile Calculates the 1-D Matrix Profile[1] and returns Tukey's Five Number Set plus the mean of that Matrix Profile.
23 mean_n_absolute_max Calculates the arithmetic mean of the n absolute maximum values of the time series.
24 number_crossing_m Calculates the number of crossings of x on m.
25 number_cwt_peaks Number of different peaks in x.
26 number_peaks Calculates the number of peaks of at least support n in the time series x.
27 partial_autocorrelation Calculates the value of the partial autocorrelation function at the given lag.
28 query_similarity_count This feature calculator accepts an input query subsequence parameter, compares the query (under z-normalized Euclidean distance) to all subsequences within the time series, and returns a count of the number of times the query was found in the time series (within some predefined maximum distance threshold).
29 ratio_value_number_to_time_series_length Returns a factor which is 1 if all values in the time series occur only once, and below one if this is not the case.
30 value_count Count occurrences of value in time series x. ✔️
31 variance_larger_than_standard_deviation Is variance higher than the standard deviation? ✔️

Observations

  1. calculate_higher_order_moments does not always produce the same result as mean, variance, skew and kurtosis when moment order is set to [1,2,3,4]
  2. calculate_rms_to_mean_abs has no direct reference yet
  3. calculate_exponential_moving_average returns the last value in the array. Is there a reason?

Corrections

  1. calculate_katz_fractal_dimensions
  2. calculate_sum_of_reoccurring_values
  3. calculate_sum_of_reoccurring_data_points
  4. calculate_petrosian_fractal_dimension
  5. calculate_sample_entropy
  6. calculate_approximate_entropy