Fine-Tuned LLMs for Conversational Text-to-SQL

A system developed as part of the Natural Language Processing (NLP) course at the University of Salerno, focused on translating natural language into SQL queries in multi-turn conversational settings. The solution is based on the CoSQL dataset and employs fine-tuning of the open-source LLM deepseek-coder-1.3B-instruct using Low-Rank Adaptation (LoRA). The model is trained with Parameter-Efficient Fine-Tuning (PEFT) and evaluated through standard metrics such as Question Match and Interaction Match, measuring the Exact Match between predicted and gold queries.

Prerequisites

System Requirements

Python ≥ 3.13
Conda (recommended for environment management)
RAM ≥ 16GB+
VRAM ≥ 6GB+

Installation

1. Clone Repository

git clone https://github.com/cirovitale/text2sql
cd text2sql

2. Download Dataset

Download the CoSQL dataset from: https://yale-lily.github.io/cosql
Extract the downloaded files
Place the dataset folder in the /dataset/cosql_dataset/ directory:

3. Environment Setup with Conda

Create and Activate Conda Environment

# Create the environment from the provided YAML file
conda env create -f environment.yml

# Activate the environment
conda activate unisa-nlp

Usage

Training

# To fine-tune the base model on the CoSQL dataset:
python training.py

Inference

# To generate SQL queries from natural language prompts:
python inference.py

Evaluation

# To compute evaluation metrics on the test set:
python testing.py

The evaluation includes:

Question Match: Accuracy per individual question (exact SQL match)
Interaction Match: Accuracy on the entire multi-turn interaction

Project Structure

text2sql/
├── training.py             # Training pipeline
├── inference.py            # Inference pipeline
├── testing.py              # Testing pipeline
├── environment.yml         # Conda environment specification
├── dataset/
│   └── cosql_dataset/      # Directory for CoSQL dataset
├── model-007/              # Selected fine-tuned model
│   └── checkpoint-1000/    # Selected fine-tuned checkpoint
├── Documentazione.pdf      # Documentation (Italian)

Documentation

The complete documentation of the project, including literature review, methodology, datasets, training pipeline, and experimental results, is available in Italian language in: Documentazione.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Fine-Tuned LLMs for Conversational Text-to-SQL

Prerequisites

System Requirements

Installation

1. Clone Repository

2. Download Dataset

3. Environment Setup with Conda

Create and Activate Conda Environment

Usage

Training

Inference

Evaluation

Project Structure

Documentation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
model-007		model-007
.gitignore		.gitignore
Documentazione.pdf		Documentazione.pdf
LICENSE		LICENSE
README.md		README.md
environment.yml		environment.yml
inference.py		inference.py
testing.py		testing.py
training.py		training.py

Folders and files

Latest commit

History

Repository files navigation

Fine-Tuned LLMs for Conversational Text-to-SQL

Prerequisites

System Requirements

Installation

1. Clone Repository

2. Download Dataset

3. Environment Setup with Conda

Create and Activate Conda Environment

Usage

Training

Inference

Evaluation

Project Structure

Documentation

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages