Skip to content

QuangNguyen711/GraphFusionVulDetect

Repository files navigation

GraphFusionVulDetect: A Smart Contract Vulnerability Detection Pipeline

This repository contains the implementation and research associated with a pipeline for detecting vulnerabilities in smart contracts. The pipeline integrates function-level analysis using CodeBERT embeddings and graph-based classification using Graph Neural Networks (GNNs).

Project Description

The aim of this project is to develop an effective and scalable system for detecting vulnerabilities in smart contracts. The process involves:

  1. Function-Level Vulnerability Classification:

    • Functions from smart contracts are analyzed for specific vulnerability types (e.g., Re-Entrancy, Timestamp Dependency, Unhandled Exceptions).
    • A dataset is created with functions labeled as vulnerable or non-vulnerable.
    • CodeBERT is fine-tuned for each vulnerability type to generate embeddings representing the functional semantics.
  2. Graph-Based Classification:

    • Source codes are analyzed to construct Function Call Graphs (FCGs) where:
      • Nodes represent functions.
      • Edges represent function call relationships.
      • Node embeddings are generated using the fine-tuned CodeBERT model.
    • These graphs are used as input to GNN models (e.g., GCN, GraphSAGE, GAT) for classification of smart contracts as vulnerable or non-vulnerable.
  3. Baseline Comparisons:

    • Results are compared against existing methods, such as CBGRU, Peculiar, VulBERTa, TMP, AME, MANDO, MANDO-HGT, ... to demonstrate the performance improvements of the proposed approach.

Key Features

  • Data Collection:

    • Utilizes datasets such as smartbugs/smartbugs-wild and mwritescode/slither-audited-smart-contracts.
    • Functions and source code are pre-processed using tools like Slither, SmartCheck, and Mythril.
  • Modeling Techniques:

    • Fine-tuned CodeBERT for generating embeddings.
    • GNNs for graph-based classification.
    • Comparison with alternative architectures, including hybrid models (e.g., CBGRU).
  • Optimization:

    • Handles limitations of input token lengths (e.g., CodeBERT's 512-token limit).
    • Investigates strategies such as selecting key tokens (first 128 + last 382 tokens).

Results

  • Significant improvements were observed in detecting vulnerabilities, especially for Re-Entrancy, compared to traditional models.
  • Experimentation with different graph construction techniques and feature enhancements.

Folder Structure

  • datasets/: Contains datasets for functions and graphs.
  • core/: Core implementation of the pipeline components, including:
    • finetune_llm/: CodeBERT fine-tuning, dataset, pipeline, and components.
    • GFD/: GFD model pipeline, data processing, and modules.
    • preprocessing/: Scripts for dataset preprocessing and analysis.
    • utils/: Utility functions for the pipeline.
  • notebooks/: Jupyter notebooks for data processing, model training, and evaluation.
  • docs/: Additional project documentation and baseline comparisons.

Getting Started

  1. Prerequisites:

    • Python 3.8+
    • PyTorch
    • Hugging Face Transformers
    • DGL (Deep Graph Library)
    • Pandas, NumPy, Scikit-learn
  2. Setup: Clone the repository and install dependencies:

    git clone https://github.com/QuangNguyen2910/GraphFusionVulDetect.git
    cd GraphFusionVulDetect
    pip install -r requirements.txt
  3. Data Preparation:

    • Follow instructions in docs/data_preparation.md to preprocess datasets and create FCGs.
  4. Training & Evaluation (using Makefile): You can use the provided Makefile to run training and evaluation commands easily:

    • Fine-tune CodeBERT:
      make finetune-train
    • Evaluate CodeBERT:
      make finetune-eval
    • Train GFD model:
      make gfd-train
    • Evaluate GFD model:
      make gfd-eval

    These commands will execute the corresponding Python modules for training and evaluation.

Citation

If you use this project in your research, please cite the accompanying paper:

@article{GraphFusionVulDetect,
  title={GraphFusionVulDetect: A Smart Contract Vulnerability Detection Using CodeBERT and GNNs},
  author={Quang Nguyen, Tuyen Vu, Minh Pham, Kien Nguyen, Cong Tran},
  journal={None},
  year={2024}
}

Contact

For questions or collaborations, contact Quang Nguyen.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Packages

 
 
 

Contributors