4. Developper documentation

Classes organization

Support tools : package `wisp_tools`

The five classes inside this folder contains support functions that are not specific to WISP.

my_checker is about checking types and comparing them to function signature. Usage of the two decorators is described inside the class.
my_coros contains the co-routine method ; this class aims to gather multi-threading tools.
my_fasta is about loading and processing .fna/.fastq/.fasta files. It uses the biopython package, however this class is badly optimized as of now.
my_logs contains the logging facilities for this project.
my_maths contains mathematical tools. It is currently unused, but stays as a placeholder for future expansions.

To help figuring out package origin, all functions inside this folder are prefixed by my_.

Plotting facilities : package `wisp_view`

All representation facilities are condensed inside this package.

mass_analysis contains functions to compare multiple outputs.
plotters is about plotting models and databases (model interpretation).
tex_report contains tools to create a LaTeX report.
tree_rendering creates trees with graphviz engine ; disabled as of now, as graphviz does not run on the cluster I had access to.
visualisation_tool is about plotting results of predictions and validation of models (output interpretation)

Encoding kmers and manipulating data : package `wisp_lib`

Those support functions, however, are specific to WISP.

data_manipulation contains all functions to create, edit, and check existence of files used by WISP.
kmers_coders contains all function that encode, decode kmers, split subreads.
parameters_init is about creating a parameter file.

Other stuff : main `wisp` package

WISP core utilities

build_softprob contains methods to build models.
predictors does prediction stuff and maths on it.
sample_class contains database creation utilities.
utilities is a call script for most of the support functions.
wisp_build is a loop to build databases and models.
wisp_predict is a loop to predict samples.
wisp is a thread manager (interface we call) for prediction and build.

Other files

_version and versioneer handles version managment to track program current version and displaying it in reports and such.
setup and MANIFEST are about creating the package (future PyPi distribution intended).
Dockerfile is an attempt to create a Docker env.
processing is about testing leave-one-out (see article).
wisp_pipeline is about executing full pipeline over a set of files.
env is about creating conda environment and directories.

Development roadmap

In this section, I want to emphasize some thoughts my director and I had concerning the future of the software :

Accepting raw .fast5 as inputs to lower the weight of sequencing errors. As of inputs, re-working the MinION input function will be mandatory in a near future ; as of now, all information about basecalling are discarded.
Shifting from hierarchical classification to graph-like one by considering classification rather as a 3-dimensional proximity graph with nodes in a same z-axis plane belonging at a same taxonomic level. This way, we could elaborate on composition proximity at each level to draw distances and use custom weights to help with interpretation.
Incorporating ORI-like ASP to help reads binning. In ORI, it seeks to minimize the number of strains that qualifies the maximum of reads. Here, implementation would be a similar approach, but rather focused on the family level.
Reworking the parameter file system, as it is quite junky. As well, the call functions are quite tedious to use for non-programmers, and would benefit from shorter commands.
Adding a weight to read size when aggregating results to neglect small genome shared parts over huge well-assigned parts.

WISP : Bacterial families identification from long reads, machine learning with XGBoost

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

4. Developper documentation

Classes organization

Support tools : package `wisp_tools`

Plotting facilities : package `wisp_view`

Encoding kmers and manipulating data : package `wisp_lib`

Other stuff : main `wisp` package

WISP core utilities

Other files

Development roadmap

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally

4. Developper documentation

Classes organization

Support tools : package wisp_tools

Plotting facilities : package wisp_view

Encoding kmers and manipulating data : package wisp_lib

Other stuff : main wisp package

WISP core utilities

Other files

Development roadmap

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally

Support tools : package `wisp_tools`

Plotting facilities : package `wisp_view`

Encoding kmers and manipulating data : package `wisp_lib`

Other stuff : main `wisp` package