Skip to content

01Qin/CVE-Vulnerability-Analytics

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

18 Commits
 
 
 
 
 
 
 
 

Repository files navigation

CVE Vulnerability Analytics and Forecasting

Overview & Motivation

This project analyzes the Common Vulnerabilities and Exposures (CVE) dataset to uncover historical trends, assess severity levels, and forecast future risks in cybersecurity.
It demonstrates skills in data preprocessing, time-series analysis, statistical modeling, and visualization using Python.

Key Questions:

  • Which vulnerability types are becoming more frequent?
  • Can we predict the likelihood of high-risk vulnerabilities appearing over time?

Folder


Dataset

  • Source: CVE Dataset (Kaggle)
  • Includes:
    • cvss: Severity score (CVSS)
    • pub_date / mod_date: Publication & modification dates
    • cwe_code / cwe_name: Weakness type (e.g., CWE-79 for XSS)
    • access_ / impact_: Attack complexity, vector, and CIA triad impact

Key Objectives

  • Clean and preprocess CVE data (missing values, ordinal encoding).
  • Identify year-by-year trends and the top software weakness categories (CWEs).
  • Detect outliers and correlations among vulnerability features.
  • Forecast future occurrences of CWE-79 using Exponential Smoothing.
  • Build a logistic regression model to predict the likelihood of CWE-79 vulnerabilities.

Highlights

  • Temporal Analysis: Visualized logarithmic growth of CVEs over 20+ years.
  • Forecasting: Predicted CWE-79 vulnerability counts with a 5-year horizon.
  • Statistical Rigor: Applied Pearson, Spearman, and covariance analysis.
  • Classification: Achieved measurable accuracy in detecting CWE-79 trends via logistic regression.

Sample Outputs

  • Line charts of CVE growth by year.
  • Top 5 CWE category trends.
  • Covariance heatmap of severity metrics.
  • Time-series forecast and ROC curve.

Tech Stack

  • Language: Python
  • Libraries:
    pandas, numpy, scikit-learn, statsmodels,
    matplotlib, seaborn, plotly, scipy, kagglehub

Future Enhancements

  • Add NLP features (TF-IDF, BERT)
  • Implement ARIMA/Prophet for advanced forecasting
  • Predict CVSS severity score using regression

License

MIT License – see LICENSE for details.

About

Python‑based analysis of the Common Vulnerabilities and Exposures (CVE) dataset, focusing on historical trends, severity assessment, and time‑series forecasting of future cybersecurity risks.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors