Legal assistant

This repo shows how to fine-tune a LLaMa 3.2 model using lora on legal contract clauses .

Project Structure

project/
├── data/                         # Raw data and soft label outputs
├── training/                     # Format & prefix-tune model
├── inference/                    # Generate explanations using tuned model
├── models/                       # Prefix-tuned model (PEFT format)
├── requirements.txt
└── README.md

Setup

Install dependencies:

pip install -r requirements.txt

Pipeline Overview

Step 1: Format dataset to instruction–input–output format

Script: training/format_dataset.py
What it does: Extracts sentence + label pairs and structures them like:

{
  "instruction": "Explain the legal concept in this clause.",
  "input": "The contract may be terminated at any time...",
  "output": "Termination"
}

Run it:

python training/format_dataset.py

Output → data/xx_train.jsonl

Step 2: Train a llama3 model with lora

Script: training/train_lora_tuning.py
What it does: Loads xx_train.jsonl, applies lora tuning using Hugging Face + PEFT.

Run it:

python training/train_lora_tuning.py

Output → models/llama3_lora_tuned/

Step 3 Evaluate explanation quality with GPT 4

Script: inference/evaluate_llm_as_judge.py
What it does: Compares generated soft labels to GPT-4 with the references in the test set.

🏃 Run it:

python inference/evaluate_llm_as_judge.py

Input → data/eval_set.jsonl with:

{ "prediction": "...", "reference": "..." }

Reference

CUAD: An Expert-Annotated NLP Dataset for Legal Contract Review https://arxiv.org/abs/2103.06268

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Legal assistant

This repo shows how to fine-tune a LLaMa 3.2 model using lora on legal contract clauses .

Project Structure

Setup

Install dependencies:

Pipeline Overview

Step 1: Format dataset to instruction–input–output format

Step 2: Train a llama3 model with lora

Step 3 Evaluate explanation quality with GPT 4

Reference

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
evaluation_results		evaluation_results
inference		inference
models		models
training		training
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

Legal assistant

This repo shows how to fine-tune a LLaMa 3.2 model using lora on legal contract clauses .

Project Structure

Setup

Install dependencies:

Pipeline Overview

Step 1: Format dataset to instruction–input–output format

Step 2: Train a llama3 model with lora

Step 3 Evaluate explanation quality with GPT 4

Reference

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages