Skip to content

ZanderZhan/ML-Project

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

55 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

GitHub Issue Report Classification

Comparison

Model Max sequence length Epochs: Stoped Epoch Early stopping(patience) Batch size learning_rate Weight decay optimizer Accuracy Precision Recall F1 Training Time
BERT 128 4 -(use fixed epochs) 4 1e-5 0.01 AdamW 0.857484 0.855020 0.857484 0.855786 7:04:08
FLAN-T5 128 4 -(use fixed epochs) 4 1e-5 0.01 AdamW 0.850928 0.846126 0.850928 0.846314 12:15:38
GPT2 128 50:7 3 4 1e-5 0.01 AdamW 0.858421 0.854322 0.858421 0.854667 14:23:23
FUNNEL 128 50:6 3 32 1e-5 0.01 AdamW 0.859218 0.856484 0.859218 0.857384 3:16:23

Dataset

  1. dataset/raw: contains the original dataset, do not edit these files
  2. dataset/preprocess: contains the dataset which has been processed by scripts/preprocessing.by

To get the train data,

  1. you can simply unzip the dataset/preprocess/github-labels-top3-803k-train.csv.zip
  2. or you run the script scripts/preprocessing.ipynb

Models

  1. BERT
  2. FLAN-T5
  3. GPT2
  4. FUNNEL

DataSet

  • id
  • issue_url
  • issue_label
  • issue_created_at
  • issue_author_association
  • repository_url
  • issue_title
  • issue_body

Preprocessing

data cleaning

  • drop rows with empty/NAN in issue_body/issue_title
  • drop rows which label is not in [bug, enhancement, question]
  • concatenate issue_title and issue_body into one metadata: issue_data.
  • replace tabs and breaks in the issue_data with spaces, then remove repeating whitespaces
  • tokenize issue_data data using BertTokenizer
  • split data
    • 85% training data
    • 15% testing data

AI-Assistance log

  1. how does “DistilBERT/BERT” works?
  2. what are the imbalance methods
  3. how to choose stratified by label
  4. What other NLP models are there besides BERT?
  5. Tell me more about ELECTRA
  6. How do I decide the number of epoch?
  7. I am training a flan-t5 model, please tell me what's wrong?
  8. In Huggingface Trainer, do I need to implictly set fp16=True?

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •