Development of different machine learning models for network traffic classification and their subsequent application to real network traffic, combining machine learning with cybersecurity.
The program uses tools like tshark and Python libraries to intercept incoming network traffic to the machine.
For each identified network flow, specific metrics are generated (source and destination IPs, ports, packet size, etc.). This data is stored in a CSV file.
The generated CSV file is analyzed by an AI model trained on the CIC-DDoS2019 dataset. The model determines whether the traffic corresponds to legitimate behavior or a DDoS attack.
git clone https://github.com/ctfhacks/Network-traffic-classifier.git
cd Network-traffic-classifierMake sure you have Python 3.9 or higher installed.
pip install -r requirements.txt --break-system-packagestshark is required to capture and analyze network traffic.
sudo apt update
sudo apt install tsharkOnce all dependencies are installed, you can run the capture script and begin analyzing traffic.
To start capturing incoming network traffic, use the following command:
python3 ddos_flow_capture.py -i eth1Replace eth1 with the name of the network interface you want to monitor (you can list available interfaces using ip a).
The script will begin capturing incoming traffic to the machine, and once it reaches 10,000 packets, it will automatically process the data and export it to a CSV file. Each row in the CSV represents a complete network flow with extracted features suitable for analysis by the AI model.
Once a CSV file is generated (after capturing 10,000 packets), it is stored under the following directory:
/opt/DDOS/
Inside this main folder, there are three subdirectories used to organize the CSV files depending on their processing status:
-
/opt/DDOS/scanning/
Contains CSV files currently being created. These are flows that have not yet reached the 10,000-packet threshold. -
/opt/DDOS/generated/
Stores completed CSV files that have reached 10,000 packets and are ready to be analyzed by the AI model. -
/opt/DDOS/read/
Contains CSV files that have already been processed and classified by the AI as either legitimate or DDoS traffic.
This structure helps ensure a clean pipeline between the traffic capture component and the detection component based on the AI model.
6. Trainning and evaluating different machine learning models (models_dev_eng.ipynb or models_dev_esp.ipynb)
The file data.csv contains over 225,000 examples of network traffic. Bellow are some relevant features:
| Feature | Type |
|---|---|
Source IP |
object |
Source Port |
int |
Destination IP |
object |
Destination Port |
int |
Protocol |
int |
Timestamp |
object |
The dataset contains 85 columns.
Before training the models, the data is preprocessed using the pandas and ipaddress libraries. This involves deleting and modifying certain columns. Also, data was split using sklearn and then scaled using RobustScaler().
- Logistic Regression
- KMEANS
- Gaussian Naive Bayes
- Artificial Neural Network
For each model (except for KMEANS) precision, recall and F1-Score (metric that relates the previous ones) are evaluated. In addition, ANN is leveraged to compare results using scaled and unscaled data, thus demonstrating the importance of scaling.
trained_traffic_classifier.keras, scaler.pkl and imputer.pkl are uploaded in the classifier.py script to be applied to the obtained datasets.
python3 classifier.pyThis project is a collaboration between two developers combining machine learning with real-time network traffic analysis:
-
@mrcsgh
Responsible for developing, training, and validating the machine learning model used to classify network traffic based on the CIC-DDoS2019 dataset, and integrating AI with network traffic capture. -
@ctfhacks
Developed the Python-based system for real-time network traffic capture, data extraction, CSV generation, and integration with the AI model for detection.
Together, this project enables automated detection of DDoS attacks by bridging cybersecurity and AI.