proj-adl-classification

statistical_feature.py

Using GPU to compute statistical features based on PyTorch.

Also compare the results with the features computed by CPU (Numpy).

The return is a pd dataframe with columns: 'feature name', 'feature value gpu', 'feature value cpu', and 'time consumption'.

X - Time series

"": No reference
" **": More than one reference and one is questionable
"~": Further research required on feature

Statistical Features

Number	Feature	Description	Info
1	calculate_harmonic_mean_abs(X)	Calculates the harmonic mean of the absolute values of X	*
2	calculate_trimmed_mean_abs(X)	Calculates the trimmed mean of absolute values of X	*
3	calculate_std_abs(X)	Calculates the standard deviation of the absolute values of X	*
4	calculate_skewness_abs(X)	Calculate skewness of absolute values of X	*
5	calculate_kurtosis_abs(X)	Calculates the kurtosis of the absolute values of X	*
6	calculate_median_abs	Calculates the median of the absolute values of X	*
7	calculate_min_abs(X)	Calculates the minimum value of the absolute values of X	*
8	calculate_range_abs(X)	Calculates the range of the absolute values of X	*
9	calculate_variance_abs(X)	Calculates the variance of the absolute values of X	*
10	calculate_mean_absolute_deviation(X)	Calculates the mean of the absolute deviation of X	~
11	calculate_signal_magnitude_area(X)	Calculates the magnitude area of X. The sum of the absolute values of X	~
12	calculate_cardinality(X)	~
13	calculate_rms_to_mean_abs(X)	Computes the ratio of the RMS value to mean absolute value of X	*
14	calculate_area_under_squared_curve(X)	Computed the area under the curve of X squared	*
15	calculate_exponential_moving_average(X, param)	Calculates the exponential moving average of X	*
16	calculate_fisher_information(X)	Computes the Fisher information of X	~
17	calculate_local_maxima_and_minima(X)	Calculates the local maxima and minima of X	*
18	calculate_log_return(X)	Returns the logarithm of the ratio between the last and first values of which is a measure of the percentage change in X	~
19	calculate_lower_complete_moment(X)		*
20	calculate_mean_second_derivative_central(X)	Returns the mean of the second derivative of X
21	calculate_median_second_derivative_central(X)	Calculates the median of the second derivative of X	*
23	calculate_ratio_of_fluctuations(X)	Computes the ratio of positive and negative fluctuations in X	*
24	calculate_ratio_value_number_to_sequence_length(X)	Returns the ratio of length of a set of X to the length X	*
25	calculate_second_order_difference(X)	Returns the second differential of X	**
26	calculate_signal_resultant(X)		*
27	calculate_sum_of_negative_values(X)	Calculates the sum of negative values in X	*
28	calculate_sum_of_positive_values(X)	Returns the sum of positive values in X	*
29	calculate_variance_of_absolute_differences(X)	Returns variance of the absolute of the first order difference of X
30	calculate_weighted_moving_average(X)	Returns the weighted moving average of X	*
31	calculate_covariance		~

Statistical Features - NEW!!

Number	Feature	Reference
1.	calculate_mean_to_variance

Time-Frequency Features

Number	Feature	Reference
1	extract_wavelet_features(params)
2	extract_spectrogram_features(params)
3	extract_stft_features(params)
4	teager_kaiser_energy_operator(X)

Spectral Features

Number	Feature	Reference
1	calculate_spectral_subdominant_valley	*
2

NOT in tsfresh

Spectral Features

Median frequency
Spectral bandwidth
Spectral absolute deviation
Spectral slope linear
Spectral slope logarithmic
Spectral flatness
Peak frequencies
Spectral edge frequency
Band power
Spectral entropy
Spectral contrast
Spectral coefficient variation
Spectral flux
Spectral rolloff
Harmonic ratio
Fundamental frequency
Spectral crest factor
Spectral decrease
Spectral irregularity
Mean frequency
Frequency winsorized mean
Total harmonic distortion
Inharmonicity
Tristimulus
Spectral rollon
Spectral hole count
Spectral autocorrelation
Spectral variability
Spectral spread ratio
Spectral skewness ratio
Spectral kurtosis ratio
Spectral tonal power ratio
Spectral noise to harmonics ratio
Spectral even to odd harmonic energy ratio
Spectral strongest frequency phase
Spectral frequency below peak
Spectral frequency above peak
Spectral cumulative frequency
Spectral cumulative frequency
Spectral cumulative frequency above
Spectral spread shift
Spectral entropy shift
Spectral change vector magnitude
Spectral low frequency content
Spectral mid frequency content
Spectral peak-to-valley ratio
Spectral valley depth mean
Spectral valley depth std
Spectral valley depth variance
Spectral valley width mode
Spectral valley width standard deviation
Spectral subdominant valley
Spectral valley count
Spectral peak broadness
Spectral valley broadness
Frequency variance
Frequency standard deviation
Frequency Range
Frequency Trimmed mean
Harmonic product spectrum
Smoothness
Roughness

Time-Frequency Features

Statistical features from wavelets, spectrogram and short-time fourier transform

Statistical Features

Hurst exponent from detrended fluctuation analysis
Winsorized mean
Weighted moving average
Sum of positive values
Sum of negative values
Stochastic oscillator value
Smoothing by binomial filter
Signal-to-noise ratio
Signal resultant
Second order difference
Ratio value number to sequence length
Ratio beyond r signal
Petrosian fractal dimension
Percentage of positive values
Percentage of negative values
Pearson correlation coefficient
Peak-to-peak distance
Number of inflection points
Moving average
Mode
Median second derivative central
Mean relative change
Mean crossings
Lower complete moment
Log return
Katz fractal dimension
Histogram bin frequencies
Fisher information
First quartile
First order difference
Exponential moving average
Energy ratio by chunks
Differential entropy
Cumulative sum
Covariance
Count
Area under curve
Area under squared curve
Renyi entropy
Tsallis entropy
Root mean squared to mean absolute
Cardinality
Hjorth mobility and complexity
Singular value decomposition (SVD) entropy
Higuchi fractal dimensions
Slope sign change
Average amplitude change
Signal magnitude area*
Median absolute deviation
Coefficient of variation
Higher order moments
Mean auto correlation
Impulse factor
Shape factor
Clearance factor
Crest factor
Zero crossings
Entropy
Log energy
Mean absolute deviation
Interquartile range
Variance absolute
Maximum absolute
Minimum absolute
Range absolute
Range
Median absolute
Kurtosis absolute
Skewness absolute
Standard deviation absolute
Trimmed mean absolute
Trimmed mean
Harmonic Mean
Harmonic mean absolute
Geometric mean
Geometric mean absolute
Mean absolute

Added features

Number	Feature	Description
1	Augmented dickey fuller test	Perform the Augmented Dickey-Fuller (ADF) test to check for stationarity in a given time series signal.
2	Hurst exponent	Calculate the Hurst Exponent of a given time series using Detrended Fluctuation Analysis (DFA).

Deleted features

Number	Feature	Reason
1	calculate_roll_mean	Same implementation as calculate_moving_average
2	calculate_absolute_energy	Same implementation as signal energy
3	calculate_cumulative_energy	Produces same result as the absolute energy and signal energy. These three will always be the same for a given signal.
4	calculate_intercept_of_linear_fit	This feature is returned again in the calculate_linear_trend_with_full_linear_regression_results function
5	calculate_pearson_correlation_coefficient	Since this function calculates the Pearson correlation coefficient between the signal and its one-step lagged version, it is fundamentally calculating the autocorrelation of the signal. The autocorrelation is already present(calculate_mean_auto_correlation). Having both is redundant.
6	calculate_slope_of_linear_fit	This is already calculated in calculate_linear_trend_with_full_linear_regression_results
7	calculate_frequency_std	Same implementation as calculate_spectral_bandwidth with order set to 2
8	calculate_frequency_variance	Same implementation as calculate_spectral_variance
9	calculate_mean_frequency(freqs, magnitudes)	Same as calculate_spectral_centroid with order set to 1
10	calculate_first_quartile	calculate_percentile(signal, percentiles=[25, 50, 75]) returns the first, second, and third quartiles
11	calculate_third_quartile	calculate_percentile(signal, percentiles=[25, 50, 75]) returns the first, second, and third quartiles
14	calculate_spectral_entropy_shift	Same implementation as calculate_spectral_entropy but with spectrum_magnitudes as argument and not psd
13	calculate_spectral_spread_shift	Same spectral standard deviation
14	calculate_spectral_autocorrelatiion	Autocorrelation of magnitudes is backed by literature

Features that should be deleted

Number	Feature	Type	Reason
1	calculate_histogram_bins	statistical
2	calculate_signal_magnitude_area	statistical
3	calculate_spectral_hole_count	spectral	Spectral holes are typically of use in radio signals. Although the aim is to make this a very comprehensive toolbox, this feature is a little bit out of scope.

Features in Tsfresh but not in SCAI toolbox

Number	Feature	Description	Added yet?
1	absolute sum of changes		✔️
2	ar_coefficient(x, param)	This feature calculator fits the unconditional maximum likelihood of an autoregressive AR(k) process
3	benford correlation		✔️
4	c3	uses c3 statistics to measure non-linearity in the time series
5	count_above(x, t)	Returns the percentage of values in x that are higher than t	✔️
6	count_below(x, t)	Returns the percentage of values in x that are lower than t	✔️
7	cid_ce(x, normalize)	This function calculator is an estimate for a time series complexity [1] (A more complex time series has more peaks, valleys etc.).	✔️
8	friedrich_coefficients(x, param)	Coefficients of polynomial h(x), which has been fitted to the deterministic dynamics of Langevin model
9	has_duplicate(x)	Checks if any value in x occurs more than once	✔️
10	has_duplicate_max(x)	Checks if the maximum value of x is observed more than once	✔️
11	has_duplicate_min(x)	Checks if the minimal value of x is observed more than once	✔️
12	index_mass_quantile(x, param)	Calculates the relative index i of time series x where q% of the mass of x lies left of i.
13	mean_n_absolute_max(x, number_of_maxima)	Calculates the arithmetic mean of the n absolute maximum values of the time series.
14	large_standard_deviation(x, r)	Does time series have large standard deviation	✔️
15	lempel_ziv_complexity(x, bins)	Calculate a complexity estimate based on the Lempel-Ziv compression algorithm.	✔️
16	matrix_profile(x, param)	Calculates the 1-D Matrix Profile[1] and returns Tukey's Five Number Set plus the mean of that Matrix Profile.
17	max_langevin_fixed_point(x, r, m)	Largest fixed point of dynamics :math:argmax_x {h(x)=0}` estimated from polynomial h(x), which has been fitted to the deterministic dynamics of Langevin model
18	binned entropy
19	symmetry looking	Boolean variable denoting if the distribution of x looks symmetric.
20	change_quantiles	First fixes a corridor given by the quantiles ql and qh of the distribution of x.
21	fft_coefficient	Calculates the fourier coefficients of the one-dimensional discrete Fourier Transform for real input by fast fourier transformation algorithm
22	matrix_profile	Calculates the 1-D Matrix Profile[1] and returns Tukey's Five Number Set plus the mean of that Matrix Profile.
23	mean_n_absolute_max	Calculates the arithmetic mean of the n absolute maximum values of the time series.
24	number_crossing_m	Calculates the number of crossings of x on m.
25	number_cwt_peaks	Number of different peaks in x.
26	number_peaks	Calculates the number of peaks of at least support n in the time series x.
27	partial_autocorrelation	Calculates the value of the partial autocorrelation function at the given lag.
28	query_similarity_count	This feature calculator accepts an input query subsequence parameter, compares the query (under z-normalized Euclidean distance) to all subsequences within the time series, and returns a count of the number of times the query was found in the time series (within some predefined maximum distance threshold).
29	ratio_value_number_to_time_series_length	Returns a factor which is 1 if all values in the time series occur only once, and below one if this is not the case.
30	value_count	Count occurrences of value in time series x.	✔️
31	variance_larger_than_standard_deviation	Is variance higher than the standard deviation?	✔️

Observations

calculate_higher_order_moments does not always produce the same result as mean, variance, skew and kurtosis when moment order is set to [1,2,3,4]
calculate_rms_to_mean_abs has no direct reference yet
calculate_exponential_moving_average returns the last value in the array. Is there a reason?

Corrections

calculate_katz_fractal_dimensions
calculate_sum_of_reoccurring_values
calculate_sum_of_reoccurring_data_points
calculate_petrosian_fractal_dimension
calculate_sample_entropy
calculate_approximate_entropy

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

proj-adl-classification

statistical_feature.py

X - Time series

Statistical Features

Statistical Features - NEW!!

Time-Frequency Features

Spectral Features

NOT in tsfresh

Spectral Features

Time-Frequency Features

Statistical Features

Added features

Deleted features

Features that should be deleted

Features in Tsfresh but not in SCAI toolbox

Observations

FilesExpand file tree

README_Features.md

Latest commit

History

README_Features.md

File metadata and controls

proj-adl-classification

statistical_feature.py

X - Time series

Statistical Features

Statistical Features - NEW!!

Time-Frequency Features

Spectral Features

NOT in tsfresh

Spectral Features

Time-Frequency Features

Statistical Features

Added features

Deleted features

Features that should be deleted

Features in Tsfresh but not in SCAI toolbox

Observations