Skip to content

frandoLin/legal_assistant

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Legal assistant

This repo shows how to fine-tune a LLaMa 3.2 model using lora on legal contract clauses .

Project Structure

project/
├── data/                         # Raw data and soft label outputs
├── training/                     # Format & prefix-tune model
├── inference/                    # Generate explanations using tuned model
├── models/                       # Prefix-tuned model (PEFT format)
├── requirements.txt
└── README.md

Setup

Install dependencies:

pip install -r requirements.txt

Pipeline Overview

Step 1: Format dataset to instruction–input–output format

Script: training/format_dataset.py
What it does: Extracts sentence + label pairs and structures them like:

{
  "instruction": "Explain the legal concept in this clause.",
  "input": "The contract may be terminated at any time...",
  "output": "Termination"
}

Run it:

python training/format_dataset.py

Output → data/xx_train.jsonl


Step 2: Train a llama3 model with lora

Script: training/train_lora_tuning.py
What it does: Loads xx_train.jsonl, applies lora tuning using Hugging Face + PEFT.

Run it:

python training/train_lora_tuning.py

Output → models/llama3_lora_tuned/


Step 3 Evaluate explanation quality with GPT 4

Script: inference/evaluate_llm_as_judge.py
What it does: Compares generated soft labels to GPT-4 with the references in the test set.

🏃 Run it:

python inference/evaluate_llm_as_judge.py

Input → data/eval_set.jsonl with:

{ "prediction": "...", "reference": "..." }

Reference

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages