Skip to content

rodriluc/SVM_BB4

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

123 Commits
 
 
 
 
 
 
 
 

Repository files navigation

SU-project

Beta barrel (4 state) individual assignment

  1. clean_CV includes functions for:

    • parser
    • window list
    • binary list of aa
    • numerical list of topology
    • topology back to ch
    • save into .npz file
    • SVM
    • cross validation
  2. cv

    • tested on a range of window sizes
    • window size 17
  3. trained_model

    • saves trained model
  4. final_predictor

    • predicts topology using model
    • output looks like this for each protein:
      *ID
      *amino acid sequence
      *predicted topology

Trained my model with all the proteins I was given (42 proteins) and window size 17 with a linear kernel. Used "testingdata_pred.txt" as my test file for predictor.

For PSSM portion:

  1. all_pssm includes functions to:

    • parse and prepare input for predictor
    • optimized version (PSSM_model.sav)
  2. PSSM_predictor

    • predicts topology on 50 proteins using model, currently split as 80% train and 20% test on original dataset (membrane-beta_4state.3line1.txt)
    • "PSSM_prediction.txt" is the output for the predicted topology of 50 proteins, currently 80% train and 20% test results
    • "PSSM_prediction_scores.txt" is all the data of my scores

Trained my model with the PSSMs found in folder "PSSM" and tested on the PSSMs found in folder "PSSM_50test", both located in "PSI-BLAST". But my predictor is currently set-up to train 80% of my dataset and test 20% of it.

About

Project assignment: Beta Barrel (4 state)

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors