Skip to content

raquelanamb/static-malware-feature-importance

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

23 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Feature Importance in Machine Learning Models for Static Malware Detection

This repository contains the code and experiments for the paper Feature Importance in Machine Learning Models for Static Malware Detection.

Overview

This project analyzes which static PE file features drive malware detection decisions across different machine learning architectures. Using the EMBER 2018 dataset, we compare tree-based and neural network models with a focus on feature importance, interpretability, and robustness, rather than performance alone.

Models Evaluated

  • LightGBM
  • Random Forest
  • Feedforward Neural Network (FFNN)
  • Convolutional Neural Network (CNN)

Key Findings

  • Tree-based models achieve the highest accuracy on clean data.
  • Neural networks are less accurate but degrade more gracefully under feature perturbation.
  • Imports, string-based metadata, and entropy-related features consistently signal malware across models.
  • Different architectures rely on distinct subsets of the feature space.

Dataset

  • EMBER 2018 v2 feature dataset
  • ~800k labeled samples, 2,381 features per file
  • Static PE features only (no execution or dynamic analysis)

Methods

  • Model-specific feature importance extraction
  • Correlation analysis of high-importance features
  • Robustness testing via Gaussian noise perturbation

Disclaimer

This work evaluates static, feature-based malware detection. Gaussian perturbations are used as a stress test and do not represent realistic adversarial attacks.

Authors

  • Raquel Ana Magalhães Bush
  • Brian Kade Betterton

About

Comparative study of feature importance and robustness in tree-based and neural network models for static malware detection on EMBER 2018.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors