GraphFusionVulDetect: A Smart Contract Vulnerability Detection Pipeline

This repository contains the implementation and research associated with a pipeline for detecting vulnerabilities in smart contracts. The pipeline integrates function-level analysis using CodeBERT embeddings and graph-based classification using Graph Neural Networks (GNNs).

Project Description

The aim of this project is to develop an effective and scalable system for detecting vulnerabilities in smart contracts. The process involves:

Function-Level Vulnerability Classification:
- Functions from smart contracts are analyzed for specific vulnerability types (e.g., Re-Entrancy, Timestamp Dependency, Unhandled Exceptions).
- A dataset is created with functions labeled as vulnerable or non-vulnerable.
- CodeBERT is fine-tuned for each vulnerability type to generate embeddings representing the functional semantics.
Graph-Based Classification:
- Source codes are analyzed to construct Function Call Graphs (FCGs) where:
  - Nodes represent functions.
  - Edges represent function call relationships.
  - Node embeddings are generated using the fine-tuned CodeBERT model.
- These graphs are used as input to GNN models (e.g., GCN, GraphSAGE, GAT) for classification of smart contracts as vulnerable or non-vulnerable.
Baseline Comparisons:
- Results are compared against existing methods, such as CBGRU, Peculiar, VulBERTa, TMP, AME, MANDO, MANDO-HGT, ... to demonstrate the performance improvements of the proposed approach.

Key Features

Data Collection:
- Utilizes datasets such as smartbugs/smartbugs-wild and mwritescode/slither-audited-smart-contracts.
- Functions and source code are pre-processed using tools like Slither, SmartCheck, and Mythril.
Modeling Techniques:
- Fine-tuned CodeBERT for generating embeddings.
- GNNs for graph-based classification.
- Comparison with alternative architectures, including hybrid models (e.g., CBGRU).
Optimization:
- Handles limitations of input token lengths (e.g., CodeBERT's 512-token limit).
- Investigates strategies such as selecting key tokens (first 128 + last 382 tokens).

Results

Significant improvements were observed in detecting vulnerabilities, especially for Re-Entrancy, compared to traditional models.
Experimentation with different graph construction techniques and feature enhancements.

Folder Structure

datasets/: Contains datasets for functions and graphs.
core/: Core implementation of the pipeline components, including:
- finetune_llm/: CodeBERT fine-tuning, dataset, pipeline, and components.
- GFD/: GFD model pipeline, data processing, and modules.
- preprocessing/: Scripts for dataset preprocessing and analysis.
- utils/: Utility functions for the pipeline.
notebooks/: Jupyter notebooks for data processing, model training, and evaluation.
docs/: Additional project documentation and baseline comparisons.

Getting Started

Prerequisites:
- Python 3.8+
- PyTorch
- Hugging Face Transformers
- DGL (Deep Graph Library)
- Pandas, NumPy, Scikit-learn

Setup: Clone the repository and install dependencies:

git clone https://github.com/QuangNguyen2910/GraphFusionVulDetect.git
cd GraphFusionVulDetect
pip install -r requirements.txt

Data Preparation:
- Follow instructions in docs/data_preparation.md to preprocess datasets and create FCGs.
Training & Evaluation (using Makefile): You can use the provided Makefile to run training and evaluation commands easily:
- Fine-tune CodeBERT:
```
make finetune-train
```
- Evaluate CodeBERT:
```
make finetune-eval
```
- Train GFD model:
```
make gfd-train
```
- Evaluate GFD model:
```
make gfd-eval
```
These commands will execute the corresponding Python modules for training and evaluation.

Citation

If you use this project in your research, please cite the accompanying paper:

@article{GraphFusionVulDetect,
  title={GraphFusionVulDetect: A Smart Contract Vulnerability Detection Using CodeBERT and GNNs},
  author={Quang Nguyen, Tuyen Vu, Minh Pham, Kien Nguyen, Cong Tran},
  journal={None},
  year={2024}
}

Contact

For questions or collaborations, contact Quang Nguyen.

Name		Name	Last commit message	Last commit date
Latest commit History 44 Commits
core		core
datasets/SmartContractVulnerabilityDetection		datasets/SmartContractVulnerabilityDetection
docs		docs
notebooks		notebooks
.gitignore		.gitignore
.project-root		.project-root
.python-version		.python-version
MANAGE.md		MANAGE.md
README.md		README.md
makefile		makefile
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
setup_dgl.txt		setup_dgl.txt
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GraphFusionVulDetect: A Smart Contract Vulnerability Detection Pipeline

Project Description

Key Features

Results

Folder Structure

Getting Started

Citation

Contact

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

GraphFusionVulDetect: A Smart Contract Vulnerability Detection Pipeline

Project Description

Key Features

Results

Folder Structure

Getting Started

Citation

Contact

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages