This project explores different methods to reduce features, evaluate predictive power, and interpret model outputs on CNC machine time-series data.
The file config.py controls key experiment settings:
data_path→ Path to the dataset file (you can switch between datasets by editing this).split→ Train/validation/test split ratio (default:0.6, 0.2, 0.2).window_size→ Sequence length for time-series models.
Update these values in config.py to adapt the experiments to different datasets or configurations.
Before running the project, make sure to install all dependencies.
You can install them directly from the requirements.txt file:
pip install -r requirements.txt
- Applies linear dimensionality reduction.
- Retains 98% variance (configurable).
- Produces:
pca_latent.csvcontaining reduced features (PC1..PCk) + targetpca_scree.pngshowing explained variance across components
Default (uses paths & dataset split ratio from config.py)
python main.py --model pca_model
- Non-linear compression using a fully connected autoencoder.
- Encoder bottleneck (
latentlayer) provides reduced features. - Produces:
dense_latent.csvcontaining latent features (latent_0..latent_n) + target- Saved models:
dense_autoencoder.h5,dense_encoder.h5
python main.py --model dense_ae
- Temporal autoencoder combining convolution and recurrent layers.
- Learns latent representations of sequential windows.
- Produces:
lstm_cnn_latent.csvwith latent features aligned to the target
python main.py --model lstm_cnn
For each reduced representation, an LSTM Regressor is trained to predict the target variable.
- Input:
pca_latent.csv - Uses context window (Window_size = 60, as it produced the best accuracy score in our experimentation) to build sequences
- Produces:
- Metrics (R², MAE, RMSE)
pca_true_vs_pred.png(True vs Predicted plot)
Default (uses paths & dataset split ratio from config.py)
python -m evaluation.lstm_regressor --latent_csv output/pca_latent.csv --model_type pca
- Input:
dense_latent.csv - Uses context window (Window_size = 60)
- Produces:
- Metrics (R², MAE, RMSE)
denseae_true_vs_pred.png
python -m evaluation.lstm_regressor --latent_csv output/dense_latent.csv --model_type dense_ae
- Input:
lstm_cnn_latent.csv - Uses shorter context window (Window_size = 10) as it produced best accuracy score.
- Produces:
- Metrics (R², MAE, RMSE)
lstm_cnn_true_vs_pred.png
python -m evaluation.lstm_regressor --latent_csv output/lstm_cnn_latent.csv --model_type lstm_cnn
- Calculates loading scores to quantify how much each original feature contributes to each principal component.
- Aggregated importance provides a ranking of features.
- Produces:
pca_top30_loadings.png— Top 30 most important featurespca_bottom30_loadings.png— Bottom 30 - least important features
python -m interpretability.loadings --data_path datasets/DMC2_S_CP2.csv
- Uses SHAP GradientExplainer on the encoder’s latent space.
- Attributes contributions of original features to learned latent dimensions.
- Produces:
denseae_top30.png— Top 30 features shaping the latent spacedenseae_bottom30.png— Bottom 30 - least important features
python -m interpretability.shap_ae --model_path output/dense_autoencoder.h5
-
Latent CSVs
pca_latent.csv,dense_latent.csv,lstm_cnn_latent.csv
-
Evaluation Plots
pca_true_vs_pred.png,denseae_true_vs_pred.png,lstm_cnn_true_vs_pred.png
-
Interpretability Plots
pca_top30_loadings.png,pca_bottom30_loadings.pngdenseae_top30.png,denseae_bottom30.png
- Reduced original feature set from 52 → ~25 latent features using PCA and Autoencoder Architectures.
- Using the reduced features, Supervised LSTM-CNN achieved R² ≈ 0.8, exceeding the original full-feature performance (≈ 0.7).
- Identified the most important sensor features, enabling prioritized monitoring and maintenance in CNC manufacturing.
- Highlighted redundant and less useful sensors, allowing data collection and storage costs to be reduced.
- Improved model efficiency: less input data → faster training/inference, lower memory and computational demands.
- Enhanced interpretability and trust in model decisions, which is critical for industrial applications and deployment in sensor systems.
Shweta Bambal
Research Assistant (DS/ML) at OVGU, Magdeburg
📧 Email: shwetabambal18@gmail.com
This project is licensed under the MIT License — see the LICENSE file for details.