Issues training custom model

Hi! 

I’m experiencing some issues training a custom SquiggleNet model. Would it be possible to check whether I’m doing something wrong? 


**Classification using your pretrained models, on human and E. coli R9 reads** 

First I tried running SquiggleNet inference using your pretrained models, on human and E. coli nanopore R9 data. This gave very nice results: an accuracy of 83-86% (depending on the model). 

I did assume that human = 1 and bacterial = 0 in these models. Is this correct? 


**Classification using a custom model, on human and SARS-CoV2 R9 reads** 

Then I tried training a model using human and SARS-CoV2 data. Classifying the test data using this custom model, resulted in an accuracy of only 3%. 

I am not sure whether I executed your scripts correctly. It would be very much appreciated if you wanted to check whether I ran your scripts as intended. 

1. Splitting up the data into training, validation and test datasets 
        - I used an equal amount of target (= SARS-CoV2) and non-target (=human) reads (269507 reads) 
        - 80% of the reads were randomly selected and allocated to the training dataset, another 10% were allocated to the validation dataset, the remaining reads were allocated to the test dataset 
        - the remaining reads (not included in the 269507 reads I started with) were also allocated to the test dataset 
       
2. Preprocessing the training data 
        - `python ./SquiggleNet/preprocess.py -gp sarscov2_train_readids.txt -gn human_train_readids.txt -i fast5_human_and_sarscov2 -o outfolder_train`
        - this resulted in 21 `neg_*.pt` and `15 pos_*.pt` data batches 
3. Preprocessing the validation data 
        - `python ./SquiggleNet/preprocess.py -gp sarscov2_val_readids.txt -gn human_val_readids.txt -i fast5_human_and_sarscov2 -o outfolder_val`
        - this resulted in 2 `neg_*.pt` and 1 `pos_*.pt` data batches 

4. Training a custom model 
        - I was confused as to how the `trainer.py` script had to be executed. The preprocessing-script resulted in multiple pytorch tensors, but only one file can be specified at once to execute the trainer-script I think? 
        - I used 15 batches (each iteration one target, one non-target) to train the model. I used the `--intermediate` option to finetune the previous model after the first iteration. Is this how the trainer script is intended to be used? 
        - `python ./SquiggleNet/trainer.py -tt outfolder_train/pos_10000.pt -nt outfolder_train/neg_10000.pt -tv outfolder_val/pos_10000.pt -nv outfolder_val/neg_10000.pt -o trainedModel_b1.ckpt -e 3`
        - `python ./SquiggleNet/trainer.py -tt outfolder_train/pos_20000.pt -nt outfolder_train/neg_20000.pt -tv outfolder_val/pos_10000.pt -nv outfolder_val/neg_10000.pt -i ./trainedModel_b1.ckpt -e 3 -o trainedModel_b2.ckpt`
        - `[…]`
        - `python ./SquiggleNet/trainer.py -tt outfolder_train/pos_150000.pt -nt outfolder_train/neg_150000.pt -tv outfolder_val/pos_10000.pt -nv outfolder_val/neg_10000.pt -i ./trainedModel_b14.ckpt -e 3 -o trainedModel_b15.ckpt`

5. Classifying the test data using the custom model 
        - `python ./SquiggleNet/inference.py -m trainedModel_b16.ckpt -i fast5_human_and_sarscov2_testdata/ -o classification_results_trainedModel_b16/`
        - only 3% of the reads were allocated to the correct class 



Any help would be greatly appreciated! 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Issues training custom model #8

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issues training custom model #8

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions