Pre-requisites:
- Pycharm or any other python IDE and python 3.9
- git clone the repo
- pip install -r requirements.txt
Before running the main.py open the /create/create_main.py
-
at line 32 change the path to the path of your java.exe (preferable download Protege ontology editor and use that Java.exe)
-
at line 555 rename the path again suitable to your project path.
run main_page.py through the IDE
- Select create pipelines
- Upload the 'csv' only file
- Press Analyse Dataset
- Select if you want to save the new folder to the Desktop or another path
- View Dataset Profile (optional)
- Select the Models type and the hyperparameter tuning method
- Select the target column (y)
- Select features that you would like to remove from the training session (optional)
- Press Create Machine learning pipeline
- The new Python-based machine learning pipelines will be in the folder path you created at the dataset analysis
- Manually run the Modelname_pipeline.py
- The model and the evaluation metrics will be saved at the same folder with the 'Modelname'_pipeline.py
run main_page.py through the IDE. (Not fully tested may introduce errors on some experiments.)
- Press Make Predictions
- Upload the whole experiment folder with the trained pipelines.
- Upload the test dataset (must be the same number of csv columns as the train set, except the target column).
- Show best model evaluation metrics (this shows the best ML pipeline metrics)
- Show best pipeline explanation (this shows the best ML pipeline explanations)
- To make explainable predictions you need to choose a row from the test dataset you uploaded, just the row number (e.g. 2)
- Then select the three explaindability methods to make explainable predictions.
If the explainable predictions are not working you can create a script and use the trained model to make manually the predictions there.
Open utils/hp_tuning/
Replace the grid search hp tuning method (inside the Python-based pipelines) with the corresponding in this folder. Select your preferred time budget, measured in seconds. (time_budget = 60 -> 1 min of training.)