Taxonomically informed parsing of kraken2 report output.
I find simple parsing of kraken2 output report can be a bit annoying. If you
parse only using the percentage/number of reads assigned, then you end up
not only with Salmonella enterica but also everything between Salmonella enterica
and root. If you filter to only include species or genus, then you could end up
missing something significant at a higher level.
This script will read in a kraken2 report, establish the taxonomic relationships between the results, and then print out as many or as few taxonomic levels as you like, starting from the "tips".
2020-10-01 This software is fresh out of the box. Please sanity check results and report any bugs.
required named arguments:
-i INHANDLE Path to kraken report file (default: None)
optional arguments:
-h, --help show this help message and exit
-n SAMPLE_NAME Sample name to be included in output. If not give,
will take everything before the first period of the
file name. (default: None)
-l NUMBER_OF_LEVELS How far up from each tip do you want to check? If not
working as expected, you may ned to alter -r as well.
(default: 2)
-p PERCENT_READS_ASSIGNED_THRESHOLD
Minimum threshold of percent_reads_assigned for
reporting (default: 0.05)
-r TAXONOMIC_RANKS Taxonomic ranks which you want to report given in
comma-separated, upper-case format, no spaces. Rank
codes should reflect Kraken2 output documented here ht
tps://github.com/DerrickWood/kraken2/wiki/Manual#sampl
e-report-output-format. Don't include number
indicating sub-ranks. If not working as expected, you
may need to alter -l as well. (default: S,G)
-t Include this option if you want to print the tree
(default: False)
Defaults:
python kraken2_scripts/parse_kraken2_report.py -i example.kraken_report.txt -n sample1
Report species, genus and family (note the -l is set to 3 as well).
python kraken2_scripts/parse_kraken2_report.py -i example.kraken_report.txt -l 3 -r S,G,F
This is quite a good option (considering switching the default), print all the "tips" which meet the percentage of reads mapped criteria.
parse_kraken2_report.py -l 1 -r S,G,F,O,C,P,K,D -i example.kraken_report.txt
Uses the python package anytree which can be installed from pip. Developed using anytree v2.8.0.