Skip to content

Pikurrot/synthesis-project-II

Repository files navigation

Synthesis Project II

Our solution for the iDISC challenge presented in the Synthesis Project II subject of our degree.

Table of Contents

1. Project structure

Stars ⭐ highlights the most important files for the implementation of our machine translation tool.

.
├── config/                  # Configuration files (needed to train and evaluate models)
│   ├── Marian-eval.json     # For the Marian model evaluation on cluster
│   ├── Marian-train-local.json # For the Marian model training locally
│   ├── Marian-train.json    # For the Marian model training on cluster
│   ├── T5-eval.json         # For the T5 model evaluation on cluster
│   ├── T5-train-local.json  # For the T5 model training locally
│   └── T5-train.json        # For the T5 model training on cluster
├── notebooks/               # Jupyter notebooks for experimentation, data analisys and data preprocessing
├── src/                     # Source code for core functionalities
│   ├── csv_to_mqxliff.py    # Script to convert CSV files to MQXLIFF format
│   ├── decompress.sh        # Shell script for decompressing the given mqxliffs
│   ├── generate_final_dataset.py ⭐ # Script to generate the final dataset (used by the web app)
│   ├── mqxliff_to_csv.py    # Script to convert MQXLIFF files to CSV format
│   ├── QAmodule.py ⭐       # Quality Assurance Module
│   ├── translation.py ⭐    # Script or module for translation functionalities
│   ├── txm_to_csv.py        # Script to convert TXM files to CSV format
│   └── utils.py             # Utility functions and helper scripts
├── web/                     # Web application files 
│   ├── backend/             # Backend server code for the web application
│   │   ├── model/           # Folder containing pretrained models (Not trained with iDISC data)
│   │   ├── app_cloud.py     # Main backend application entry point (for cloud)
│   │   ├── app.py ⭐        # Main backend application entry point (for local)
│   │   ├── package-lock.json # Dependency lock file for backend (related Node.js tools used)
│   │   └── package.json     # Package metadata and dependencies for backend (related Node.js tools used)
│   └── frontend/            # Frontend client-side code for the web application
│       ├── public/          # Static assets served directly (favicons)
│       ├── src/             # Frontend source code 
│       │   ├── assets/      # Static assets for the frontend (images, icons)
│       │   ├── pages/       # Contains the different pages or views of the frontend application
│       │   ├── App.css      # Main CSS file for the primary application component
│       │   ├── App.jsx      # Main React/JSX component for the application layout/structure
│       │   ├── index.css    # Global CSS styles or base styles for the application
│       │   └── main.jsx     # Main entry point for the frontend JavaScript/React application
│       ├── eslint.config.js # ESLint configuration for frontend code linting
│       ├── index.html       # Main HTML file for the frontend application
│       ├── package-lock.json # Dependency lock file for frontend
│       ├── package.json     # Package metadata and dependencies for frontend
│       ├── README.md        # README file specific to the frontend (Generated automatically by Vite tool)
│       └── vite.config.js   # Vite configuration file for frontend development server/build
├── .gitattributes           # Git attributes for path-specific options
├── .gitignore               # Specifies intentionally untracked files to ignore
├── cloud_inference.py       # Script for cloud-based inference deployment/testing
├── database.sql             # SQL script for database schema definition and initial data
├── demo.py                  # Script used for the demo in the class presentation
├── demo.slm                 # SLM (Slurm) script for demo, to run on a cluster
├── eval.py ⭐               # Script for model evaluation
├── eval.slm                 # SLM (Slurm) script for evaluation, to run on a cluster
├── flask_train.py           # Flask-based script for model training interface/API on the cloud
├── inference.py             # Script for model inference 
├── README.md                # This README file
├── requirements.txt         # Python dependencies
├── train.py ⭐              # Main script for model training
└── train.slm                # SLM (Slurm) script for training, to run on a cluster

2. Getting Started

Follow these instructions to get a copy of the project up and running on your local machine.

In case you don't have the time to setup the necesary thinks to run the application, we have created a video demonstrating the use of the web: https://drive.google.com/file/d/1ZzifQ5UNzKPBNY4kIoEeqkw3_-sT9s_2/view?usp=drive_link

Prerequisites

Before you begin, ensure you have the following installed:

  • Git: For cloning the repository.
  • Python 3.8+: For the backend.
  • Node.js LTS & npm: For the frontend.
    • Download the official installer from Node.js website (LTS version). npm is included with Node.js.
  • XAMPP (includes MySQL/MariaDB): For local database management.

Backend Setup

  1. Clone the repository:

    git clone https://github.com/Pikurrot/synthesis-project-II.git
    cd synthesis-project-II
  2. Navigate to the backend directory:

    cd web/backend
  3. Create and activate a virtual environment: It's highly recommended to use a virtual environment to manage dependencies.

    • For Windows:
    python -m venv venv
    .\venv\Scripts\activate
    • For Linux:
    python3 -m venv venv
    source venv/bin/activate
  4. Install backend dependencies:

    pip install -r ../../requirements.txt
  5. Download pretrained models: To enable model training via the web interface, you must manually download the required pretrained models. These models are not included in the repository due to their large file size.

    T5 (flan-t5-base)

    • Download: Pretrained T5 Model
    • Destination:
      • Windows:
       move %USERPROFILE%\Downloads\model.safetensors model\google\flan-t5-base\
      • Linux:
       mv ~/Downloads/model.safetensors model/google/flan-t5-base/

    Marian (opus-mt-en-es)

    • Download: Pretrained Marian Model
    • Destination:
      • Windows:
       move %USERPROFILE%\Downloads\model.safetensors model\Helsinki-NLP\opus-mt-en-es\
      • Linux:
       mv ~/Downloads/model.safetensors model/Helsinki-NLP/opus-mt-en-es/

Frontend Setup

  1. Navigate to the frontend directory: From the backend directory:

    cd ../frontend
  2. Install frontend dependencies: This command reads the package.json file and installs all the necessary JavaScript libraries.

    npm install

MySQL Setup

  1. Start MySQL with XAMPP

    1. Open the XAMPP Control Panel.
    2. Click Start next to MySQL.
    3. Wait until the MySQL module turns green — this means the database server is running.
  2. Open a Terminal and Log In to MySQL Open your command line interface and log in to MySQL using the mysql client.

    mysql -u root -p

    · If you're not prompted for a password, just press Enter. · If you have set a root password manually, enter it when prompted.

    · If you get "mysql is not recognized", make sure the XAMPP mysql binary is in your system's PATH. Alternatively, use the full path:

    # On Windows:
    "C:\xampp\mysql\bin\mysql.exe" -u root -p
    # On Linux or MacOS:
    /opt/lampp/bin/mysql -u root -p
  3. Create the Database Once logged into the MySQL shell:

    CREATE DATABASE Translation;
    EXIT;
  4. Import the Provided SQL File From your project root directory, run:

    mysql -u root -p your_database_name < database.sql

    If mysql is not in your PATH, use the full path again:

    "C:\xampp\mysql\bin\mysql.exe" -u root -p your_database_name < database.sql

    This command will load all tables and data from the database.sql file into the database.

  5. Stop MySQL connection If you are not going to run the application right now:

    1. Reopen the XAMPP Control Panel.
    2. Click Stop next to MySQL.
    3. Close XAMPP Control Panel.

3. Running the Application

Ensure both your backend and frontend are running simultaneously for the full application experience.

Starting the Backend

  1. Ensure your virtual environment is active. If not, navigate to your backend directory and activate it:

    cd web/backend
    # On Windows:
    .\venv\Scripts\activate
    # On macOS/Linux:
    source venv/bin/activate
  2. Run the Flask application:

    python app.py

    The backend should now be running, typically on http://127.0.0.1:5000/.

Starting the Frontend

  1. Navigate to the frontend directory:

    cd web/frontend
  2. Start the Vite development server:

    npm run dev

    The frontend application will usually open in your web browser automatically (e.g., at http://localhost:5173/ or a similar port).

Starting MySQL connection

Do the first step of MySQL Setup.

Default user

To log in to the application use the email test@idisc.com and the password 1234

Test the application with an already trained model

Training a model from scratch can take several hours. To allow you to test the application without waiting for training to complete, we provide access to a trained model already integrated into the system.

Upon logging into the web application, you'll see a company named Mitsubishi, which already has a model called Car Manuals configured and ready for use.

Note: Due to confidentiality constraints, the trained model is not publicly available. The download link below is restricted to authorized personnel only (iDISC employees and professors involved in the course).

Download

Setup Instructions

After downloading the model ZIP file:

  1. Open your project folder in Visual Studio Code (or your preferred IDE).
  2. Navigate to the path: web/backend
  3. Create a folder named as follows: uploads/Mitsubishi/Car manuals/training/checkpoints/marian/checkpoint-20250418_181332
  4. Unzip the downloaded model files into this newly created directory.

Once this is done, the application will automatically detect and load the model when using the Mitsubishi > Car Manuals translation functionality.

4. Training

In case you want to fine-tune the models, you can do it in a cluster with Slurm or locally. You must run the training script passing as argument the training configuration file of the model you want to train. Here is an example for the Marian model:

  • Slurm:

     sbatch train.slm Marian-train.json

    Update the SBATCH parameters in train.slm as your convenience.

  • Local:

     python train.py Marian-train.json

5. Evaluation

Similarly, to evaluate the model, you can follow these example commands:

  • Slurm:

     sbatch eval.slm Marian-eval.json

    Update the SBATCH parameters in eval.slm as your convenience.

  • Local:

     python eval.py Marian-eval.json

About

Our project for the Synthesis Project II subject of our degree

Resources

Stars

Watchers

Forks

Releases

No releases published

Contributors