Beta barrel (4 state) individual assignment
-
clean_CV includes functions for:
- parser
- window list
- binary list of aa
- numerical list of topology
- topology back to ch
- save into .npz file
- SVM
- cross validation
-
cv
- tested on a range of window sizes
- window size 17
-
trained_model
- saves trained model
-
final_predictor
- predicts topology using model
- output looks like this for each protein:
*ID
*amino acid sequence
*predicted topology
Trained my model with all the proteins I was given (42 proteins) and window size 17 with a linear kernel. Used "testingdata_pred.txt" as my test file for predictor.
For PSSM portion:
-
all_pssm includes functions to:
- parse and prepare input for predictor
- optimized version (PSSM_model.sav)
-
PSSM_predictor
- predicts topology on 50 proteins using model, currently split as 80% train and 20% test on original dataset (membrane-beta_4state.3line1.txt)
- "PSSM_prediction.txt" is the output for the predicted topology of 50 proteins, currently 80% train and 20% test results
- "PSSM_prediction_scores.txt" is all the data of my scores
Trained my model with the PSSMs found in folder "PSSM" and tested on the PSSMs found in folder "PSSM_50test", both located in "PSI-BLAST". But my predictor is currently set-up to train 80% of my dataset and test 20% of it.