project/
├── data/ # Raw data and soft label outputs
├── training/ # Format & prefix-tune model
├── inference/ # Generate explanations using tuned model
├── models/ # Prefix-tuned model (PEFT format)
├── requirements.txt
└── README.md
pip install -r requirements.txtScript: training/format_dataset.py
What it does: Extracts sentence + label pairs and structures them like:
{
"instruction": "Explain the legal concept in this clause.",
"input": "The contract may be terminated at any time...",
"output": "Termination"
}Run it:
python training/format_dataset.pyOutput → data/xx_train.jsonl
Script: training/train_lora_tuning.py
What it does: Loads xx_train.jsonl, applies lora tuning using Hugging Face + PEFT.
Run it:
python training/train_lora_tuning.pyOutput → models/llama3_lora_tuned/
Script: inference/evaluate_llm_as_judge.py
What it does: Compares generated soft labels to GPT-4 with the references in the test set.
🏃 Run it:
python inference/evaluate_llm_as_judge.pyInput → data/eval_set.jsonl with:
{ "prediction": "...", "reference": "..." }- CUAD: An Expert-Annotated NLP Dataset for Legal Contract Review https://arxiv.org/abs/2103.06268