Releases: mbaityje/plankifier
Final classifier used for publication
Script used for training the models used in publication: https://www.frontiersin.org/articles/10.3389/fmicb.2021.746297/full
Added Ciliates classifier
In this version, a new classifier has been added. The classifier (Efficient B0 network) was trained on Ciliates and Non-ciliates data.
Note: Classifier classifies Ciliates as Non-negatives and Non-ciliates as Negatives.
Please make use of the below filenames for predicting new data in the directory ./trained_model/Final_ciliates_eff0_classifier/
a) model file --> keras_model_finetune.h5
b) weight file --> bestweights_finetune.hdf5
Please note that the same commands from earlier version v1.2.1 works here as well.
Binary, multi and versusall classifiers with option for hypertuning
The syntax for predicting images is the same as previous releases, but there are some changes for training script.
While training, the user now can choose between binary, multi and versus-all classifier. When the hp_tuning is set to yes (in order to do hyper parameter tuning), then the user can choose more than one models to train. The user can choose between two different options for pre-processing images i.e. either resize the image to desired size by keeping the original proportions or without keeping the proportions.The user can also save the data and the filenames. Finally, the user can choose models to do either Average ensemble and stacking ensemble of models. The classification report, confusion matrix and the plots are saved in the output directory.
For the end user who wants to use the trained models for predictions,
Please install some of these important libraries- (Not everything is listed)
pip install git+https://github.com/qubvel/efficientnet
pip install -U scikit-learn
pip install tensorflow ==2.3.0
pip install keras==2.4.3
pip install imutils==0.5.3
Then to predict your images, use:
python predict.py -modelfullnames './trained-models/eff0_for_DiatomCentrics/keras_model_finetune.h5' -weightnames './trained-models/eff0_for_DiatomCentrics/bestweights_finetune.hdf5' -ensMethods 'leader' -testdirs './data/Centric_Diatoms_latest/Validation_from_experts/Centric_diatoms/' './data/Centric_Diatoms_latest/Validation_from_experts/Centric_negatives/' -predname './out/predictions/Centric_diatoms'-thresholds 0.99 0.8 2>/dev/null
If you have images for validation i.e. images that are separated according to their labels in specified folders and want to see how the model performs, then use this command: You can also specify whether to copy the misclassified images in to separate directory for further investigation by controlling -save_misclassified parameter
python validation.py -modelnames ./trained-models/eff0_for_DiatomCentrics/keras_model_finetune.h5 -weightnames ./trained-models/eff0_for_DiatomCentrics/bestweights_finetune.hdf5 -datapaths ./data/Centric_Diatoms_latest/Validation_from_experts/ -classifier binary -outpath ./out/misclassified/ -save_misclassified yes
More classes
The current release includes models trained on more classes. The syntax is the same as previous releases, but the models in ./trained-models are now trained on 34 classes.
Multiple arguments on predictions
Changes on predict.py
This patch improves usability. Now, a single run can make predictions on several directories. Also the ensembling method and the abstention can be given as multiple arguments, and the program will loop through them.
Note that the keyword for the ensembling method was changed from em to ensMethod, and absthres was changed to thresholds, and most of the other keywords were changes to the plural. For a summary, run python predict.py -h.
Example of the new usage:
python predict.py -ensMethods 'leader' -testdirs 'data/1_zooplankton_0p5x/validation/tommy_validation/images/asterionella/' 'data/1_zooplankton_0p5x/validation/tommy_validation/images/uroglena/' 'data/1_zooplankton_0p5x/validation/tommy_validation/images/asplanchna/' -thresholds 0.99 0.8 2>/dev/null
In addition to the screen output, this example produces two files, predict_leaderabs0.8.txt and predict_leaderabs0.99.txt, with the prediction from each set of options.
Changed core and introduced ensembling
What's new
I completely modified the core of the program, but you won't notice this. What you will notice are the improvements to the classifier. The main user-level feature is that I implemented some simple forms of ensembling and open set recognition. In this release, you have 4 classifiers, which you can use by themselves or together. Also, if hesitating, the classifier will restrain from making a prediction (which is what the taxonomist does with the unknown classes).
Usage
The only file that practitioners should look at is predict.py.
To display help with the command line arguments, launch as
python predict.py -h
Here is an example of usage, that implements a weighted majority rule, with a 1.0 abstention (see explanations at the bottom of the page)
python predict.py -testdir ./data/1_zooplankton_0p5x/validation/tommy_validation/images/keratella_quadrata/ -absthres=1.0 -em 'weighted-majority' 2>/dev/null
Here is a unanimity rule with 0.8 abstention.
python predict.py -testdir ./data/1_zooplankton_0p5x/validation/tommy_validation/images/keratella_quadrata/ -absthres=0.8 -em 'unanimity' 2>/dev/null
The screen output should look like this
./data/1_zooplankton_0p5x/validation/tommy_validation/images/keratella_quadrata/SPC-EAWAG-0P5X-1563026664165210-9932037279651-002549-028-1074-1754-64-36.jpeg keratella_quadrata
./data/1_zooplankton_0p5x/validation/tommy_validation/images/keratella_quadrata/SPC-EAWAG-0P5X-1575360491907312-8542395764139-004819-014-870-1900-56-52.jpeg keratella_quadrata
./data/1_zooplankton_0p5x/validation/tommy_validation/images/keratella_quadrata/SPC-EAWAG-0P5X-1575360195859503-8542099739532-001859-020-2384-1318-64-48.jpeg keratella_quadrata
./data/1_zooplankton_0p5x/validation/tommy_validation/images/keratella_quadrata/SPC-EAWAG-0P5X-1530835631817821-2549602643688-004219-000-1812-2554-40-45.jpeg keratella_quadrata
./data/1_zooplankton_0p5x/validation/tommy_validation/images/keratella_quadrata/SPC-EAWAG-0P5X-1572948291990397-6130232224429-002829-077-2144-442-72-32.jpeg keratella_quadrata
and it should also be printed in the file ./predict/predict.txt. If the folder ./predict/ does not exist, it is automatically created.
If you suspect something is going wrong, remove the 2>/dev/null from the command (that's a way to hide the annoying messages that keras sends, but it also hides warning and error messages!).
Some more explanation
Ensemble rules
I implemented 4 rules to obtain a collaboration between the classifiers:
-
'unanimity': We only accept guesses where all the classifiers agree. Disagreements will result in'Unclassified'. -
'majority': Takes the choice of the majority of classifiers. I didn't focus on dealing with ties, because I don't like this method. -
'leader': Take the choice of the most confident model. -
'weighted-majority': Sums the confidences of all the classifiers, and chooses the class with the highest summed confidence.
Abstention
Abstention is a way of having the classifiers state that they don't know enough. The normal behavior is that if a classifier's confidence is smaller than absthres, then the classifier will abstain from making predictions.
With 'weighted-majority' the behavior is different, as instead of filtering on the single models, we are now filtering on the sum of the confidences, so the confidence can be larger than one.
Output
You can specify the output directory with the -outpath option. You can also specify the output filename with -predname option.
Classifier loading
The classifiers must be loaded. The default behavior assumes that they are in the directory ./out/trained-models/, and loads the four of them. If paths change, or if you do not want to use the four of them, you need to specify their path through the option -modelfullname. To specify more than one classifier, just put the paths separated by a space. The model is the keras_model.h5 file.
Modifications on output
A pre-release for playing around with zooplankton data and feedback on API
The only file that practitioners should look at is predict.py.
Main changes
-
Undesired prints are removed
-
Added the
-preddirand-prednameoptions, which allow to specify the path and name of the output file
Minor improvements in input/output
A pre-release for playing around with zooplankton data and feedback on API
The only file that practitioners should look at is predict.py.
Main changes
-
The script
predict.pynow produces a summary classification file, calledpredict.txt. -
The
targetargument was removed, andtestdirnow is directly the folder containing the images (instead of containing folders with images). -
Error messages slightly improved.
Launch as:
python predict.py -testdir='path-to-images'
To query all options:
python predict.py -h
Patch: handle exception when training dataset unavailable
A consistency check was performed comparing the read data with the original dataset.
Users do not have access to this dataset, so we must handle the exception that comes up when the dataset is not available. This is done in the current patch.
(This same release was done earlier, but was targeted on the wrong branch).
Patch: added loadable model
Previous release did not include the path to a loadable model.
Here, we provide the model in the folder util-files/trained-conv2/