PALSS (Pangenome Graph Augmentation from Long-reads Specific Strings) is an assembly- and mapping-free approach for updating (or augmenting) a pangenome graph directly from unassembled long reads sequenced from a new individual not already in the pangenome.
PALSS has been tested only on 64bit Linux system(s).
git clone https://github.com/ldenti/palss
cd palss ; mkdir build ; cd build
cmake ..
make -j4
cd ..
./palss -hPALSS starts from a pangenome graph in GBZ format (.gbz) and a read sample (.fa/.fq, can be gzipped) and produces the corresponding augmented pangenome graph in GFA format.
We explain how to run PALSS using the example data available in the example subdirectory.
Note: we suggest to run PALSS on error-corrected reads and on small- to medium-sized pangenome graphs.
# get paths from graph and build FMD-index
LD_LIBRARY_PATH="$PWD/lib" ./build/gbwtgraph-prefix/src/gbwtgraph/bin/gbz_extract ./example/graph.gbz | ./build/rb3-prefix/src/rb3/ropebwt3 build -Ld - > ./example/paths.fa.fmd
# sketch the graph (using 4 threads and 31-mers)
./palss sketch -@4 -k31 ./example/graph.gbz > ./example/graph.gbz.skt
# search for specific strings in the haplotypes and anchor them to the graph
./palss sfs -@4 ./example/graph.gbz ./example/graph.gbz.skt ./example/paths.fa.fmd ./example/reads.fq > ./example/reads.sfs
# cluster specific strings and analyze clusters
./palss align ./example/graph.gbz ./example/reads.sfs > ./example/consensus.gaf
# augment the graph (in GBZ format, so -z) and keeps novel vertices/edges supported by at least 2 reads
# (this requires vg to be in your $PATH)
./palss augment -z -s 2./example/graph.gbz ./example/consensus.gaf > ./example/graph.augmented.gfa
Instructions and code to reproduce the experiments described in the preprint can be found here (v0.1 tag).
For any question/doubt, please open an issue.