Our solution for the iDISC challenge presented in the Synthesis Project II subject of our degree.
Stars ⭐ highlights the most important files for the implementation of our machine translation tool.
.
├── config/ # Configuration files (needed to train and evaluate models)
│ ├── Marian-eval.json # For the Marian model evaluation on cluster
│ ├── Marian-train-local.json # For the Marian model training locally
│ ├── Marian-train.json # For the Marian model training on cluster
│ ├── T5-eval.json # For the T5 model evaluation on cluster
│ ├── T5-train-local.json # For the T5 model training locally
│ └── T5-train.json # For the T5 model training on cluster
├── notebooks/ # Jupyter notebooks for experimentation, data analisys and data preprocessing
├── src/ # Source code for core functionalities
│ ├── csv_to_mqxliff.py # Script to convert CSV files to MQXLIFF format
│ ├── decompress.sh # Shell script for decompressing the given mqxliffs
│ ├── generate_final_dataset.py ⭐ # Script to generate the final dataset (used by the web app)
│ ├── mqxliff_to_csv.py # Script to convert MQXLIFF files to CSV format
│ ├── QAmodule.py ⭐ # Quality Assurance Module
│ ├── translation.py ⭐ # Script or module for translation functionalities
│ ├── txm_to_csv.py # Script to convert TXM files to CSV format
│ └── utils.py # Utility functions and helper scripts
├── web/ # Web application files
│ ├── backend/ # Backend server code for the web application
│ │ ├── model/ # Folder containing pretrained models (Not trained with iDISC data)
│ │ ├── app_cloud.py # Main backend application entry point (for cloud)
│ │ ├── app.py ⭐ # Main backend application entry point (for local)
│ │ ├── package-lock.json # Dependency lock file for backend (related Node.js tools used)
│ │ └── package.json # Package metadata and dependencies for backend (related Node.js tools used)
│ └── frontend/ # Frontend client-side code for the web application
│ ├── public/ # Static assets served directly (favicons)
│ ├── src/ # Frontend source code
│ │ ├── assets/ # Static assets for the frontend (images, icons)
│ │ ├── pages/ # Contains the different pages or views of the frontend application
│ │ ├── App.css # Main CSS file for the primary application component
│ │ ├── App.jsx # Main React/JSX component for the application layout/structure
│ │ ├── index.css # Global CSS styles or base styles for the application
│ │ └── main.jsx # Main entry point for the frontend JavaScript/React application
│ ├── eslint.config.js # ESLint configuration for frontend code linting
│ ├── index.html # Main HTML file for the frontend application
│ ├── package-lock.json # Dependency lock file for frontend
│ ├── package.json # Package metadata and dependencies for frontend
│ ├── README.md # README file specific to the frontend (Generated automatically by Vite tool)
│ └── vite.config.js # Vite configuration file for frontend development server/build
├── .gitattributes # Git attributes for path-specific options
├── .gitignore # Specifies intentionally untracked files to ignore
├── cloud_inference.py # Script for cloud-based inference deployment/testing
├── database.sql # SQL script for database schema definition and initial data
├── demo.py # Script used for the demo in the class presentation
├── demo.slm # SLM (Slurm) script for demo, to run on a cluster
├── eval.py ⭐ # Script for model evaluation
├── eval.slm # SLM (Slurm) script for evaluation, to run on a cluster
├── flask_train.py # Flask-based script for model training interface/API on the cloud
├── inference.py # Script for model inference
├── README.md # This README file
├── requirements.txt # Python dependencies
├── train.py ⭐ # Main script for model training
└── train.slm # SLM (Slurm) script for training, to run on a clusterFollow these instructions to get a copy of the project up and running on your local machine.
In case you don't have the time to setup the necesary thinks to run the application, we have created a video demonstrating the use of the web: https://drive.google.com/file/d/1ZzifQ5UNzKPBNY4kIoEeqkw3_-sT9s_2/view?usp=drive_link
Before you begin, ensure you have the following installed:
- Git: For cloning the repository.
- Python 3.8+: For the backend.
- Node.js LTS & npm: For the frontend.
- Download the official installer from Node.js website (LTS version).
npmis included with Node.js.
- Download the official installer from Node.js website (LTS version).
- XAMPP (includes MySQL/MariaDB): For local database management.
-
Clone the repository:
git clone https://github.com/Pikurrot/synthesis-project-II.git cd synthesis-project-II -
Navigate to the backend directory:
cd web/backend -
Create and activate a virtual environment: It's highly recommended to use a virtual environment to manage dependencies.
- For Windows:
python -m venv venv .\venv\Scripts\activate
- For Linux:
python3 -m venv venv source venv/bin/activate -
Install backend dependencies:
pip install -r ../../requirements.txt
-
Download pretrained models: To enable model training via the web interface, you must manually download the required pretrained models. These models are not included in the repository due to their large file size.
- Download: Pretrained T5 Model
- Destination:
- Windows:
move %USERPROFILE%\Downloads\model.safetensors model\google\flan-t5-base\
- Linux:
mv ~/Downloads/model.safetensors model/google/flan-t5-base/
- Download: Pretrained Marian Model
- Destination:
- Windows:
move %USERPROFILE%\Downloads\model.safetensors model\Helsinki-NLP\opus-mt-en-es\
- Linux:
mv ~/Downloads/model.safetensors model/Helsinki-NLP/opus-mt-en-es/
-
Navigate to the frontend directory: From the backend directory:
cd ../frontend -
Install frontend dependencies: This command reads the
package.jsonfile and installs all the necessary JavaScript libraries.npm install
-
Start MySQL with XAMPP
- Open the XAMPP Control Panel.
- Click Start next to MySQL.
- Wait until the MySQL module turns green — this means the database server is running.
-
Open a Terminal and Log In to MySQL Open your command line interface and log in to MySQL using the
mysqlclient.mysql -u root -p
· If you're not prompted for a password, just press Enter. · If you have set a root password manually, enter it when prompted.
· If you get "mysql is not recognized", make sure the XAMPP mysql binary is in your system's PATH. Alternatively, use the full path:
# On Windows: "C:\xampp\mysql\bin\mysql.exe" -u root -p # On Linux or MacOS: /opt/lampp/bin/mysql -u root -p
-
Create the Database Once logged into the MySQL shell:
CREATE DATABASE Translation; EXIT;
-
Import the Provided SQL File From your project root directory, run:
mysql -u root -p your_database_name < database.sqlIf mysql is not in your PATH, use the full path again:
"C:\xampp\mysql\bin\mysql.exe" -u root -p your_database_name < database.sql
This command will load all tables and data from the database.sql file into the database.
-
Stop MySQL connection If you are not going to run the application right now:
- Reopen the XAMPP Control Panel.
- Click Stop next to MySQL.
- Close XAMPP Control Panel.
Ensure both your backend and frontend are running simultaneously for the full application experience.
-
Ensure your virtual environment is active. If not, navigate to your backend directory and activate it:
cd web/backend # On Windows: .\venv\Scripts\activate # On macOS/Linux: source venv/bin/activate
-
Run the Flask application:
python app.py
The backend should now be running, typically on
http://127.0.0.1:5000/.
-
Navigate to the frontend directory:
cd web/frontend -
Start the Vite development server:
npm run dev
The frontend application will usually open in your web browser automatically (e.g., at
http://localhost:5173/or a similar port).
Do the first step of MySQL Setup.
To log in to the application use the email test@idisc.com and the password 1234
Training a model from scratch can take several hours. To allow you to test the application without waiting for training to complete, we provide access to a trained model already integrated into the system.
Upon logging into the web application, you'll see a company named Mitsubishi, which already has a model called Car Manuals configured and ready for use.
Note: Due to confidentiality constraints, the trained model is not publicly available. The download link below is restricted to authorized personnel only (iDISC employees and professors involved in the course).
- Download Trained Marian Model (access restricted)
After downloading the model ZIP file:
- Open your project folder in Visual Studio Code (or your preferred IDE).
- Navigate to the path: web/backend
- Create a folder named as follows: uploads/Mitsubishi/Car manuals/training/checkpoints/marian/checkpoint-20250418_181332
- Unzip the downloaded model files into this newly created directory.
Once this is done, the application will automatically detect and load the model when using the Mitsubishi > Car Manuals translation functionality.
In case you want to fine-tune the models, you can do it in a cluster with Slurm or locally. You must run the training script passing as argument the training configuration file of the model you want to train. Here is an example for the Marian model:
-
Slurm:
sbatch train.slm Marian-train.json
Update the SBATCH parameters in
train.slmas your convenience. -
Local:
python train.py Marian-train.json
Similarly, to evaluate the model, you can follow these example commands:
-
Slurm:
sbatch eval.slm Marian-eval.json
Update the SBATCH parameters in
eval.slmas your convenience. -
Local:
python eval.py Marian-eval.json