Skip to content

Vamsi-Dath/QLoRA_for_Software_Bug_Detection

Repository files navigation

Fine-Tuning LLM using QLoRA for Software Bug Detection and Measuring the Degree of Improvement

Cloning the Project with Submodules

This repository uses Git submodules. To clone the project properly, follow these steps:

Clone the Repository (with Submodules)

git clone --recurse-submodules https://github.com/Vamsi-Dath/CS540_QLORA.git
cd CS540_QLORA

Data after post processing

Drive: https://drive.google.com/drive/folders/158sunCAaWvGKcySnYbA80t5-Ve147AAs?usp=drive_link

Postprocessed data info

inside the pickle file:

<class 'pandas.core.frame.DataFrame'> Data columns (total 7 columns):

train_data_postprocessed.pkl
RangeIndex: 14118 entries, 0 to 14117

valid_data_postprocessed.pkl
RangeIndex: 9457 entries, 0 to 9456

test_data_postprocessed.pkl
RangeIndex: 161 entries, 0 to 160

Example columns (test.pickle)

before_merge

0 def remove_lb_backend_address_pool_address(cmd...
1 def split_action(arguments):\n clas...
2 def parse_series(self, data, **kwargs):\n ...
3 def init(self, **kwargs):\n # S...
4 def dump_checkpoint(self, weights_only: bo...

after_merge

0 def remove_lb_backend_address_pool_address(cmd...
1 def split_action(arguments):\n clas...
2 def parse_series(self, data, **kwargs):\n ...
3 def init(self, **kwargs):\n # S...
4 def dump_checkpoint(self, weights_only: bo...

traceback_type

0 AttributeError
1 AttributeError
2 AttributeError
3 pygmt.exceptions.GMTCLibError
4 AttributeError

full_traceback

0 john@Azure:~$ az network lb address-pool addre...
1 'str' object has no attribute 'append'\nTraceb...
2 2018-12-10 19:39 DEBUG parser_guessit movin...
3 pygmt-session [ERROR]: Syntax error: Unrecogni...
4 Running command:\npython pipe/train_cnn.py\n/h...

ast_diff

0 [--- , +++ , @@ -1,8 +1,9 @@, -Module(body=[Fu...
1 [--- , +++ , @@ -1,17 +1,17 @@, -Module(body=[...
2 [--- , +++ , @@ -1,5 +1,5 @@, -Module(body=[Fu...
3 [--- , +++ , @@ -1,27 +1,48 @@, -Module(body=[...
4 [--- , +++ , @@ -1,9 +1,9 @@, -Module(body=[Fu...

old_ast_json

0 [\n "Module(body=[FunctionDef(name='remove_...
1 [\n "Module(body=[FunctionDef(name='split_a...
2 [\n "Module(body=[FunctionDef(name='parse_s...
3 [\n "Module(body=[FunctionDef(name='_init...
4 [\n "Module(body=[FunctionDef(name='dump_ch...

new_ast_json  

0 [\n "Module(body=[FunctionDef(name='remove_...
1 [\n "Module(body=[FunctionDef(name='split_a...
2 [\n "Module(body=[FunctionDef(name='parse_s...
3 [\n "Module(body=[FunctionDef(name='_init...
4 [\n "Module(body=[FunctionDef(name='dump_ch...

CodeLlama Fine tuned Model:

Huggingface-codellama-qlora-finetuned

About

Fine-Tuning for Bug Detection and Bug Fix using Low-Rank Adapters. ⁠Preprocessed buggy–fixed code pairs into structured AST representations, highlighting semantic differences to optimize fine-grained model training. Fine-tuned CodeLlama-7B using QLoRA (4-bit), updating only 0.1% of model weights (q_proj, v_proj)

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors