GenomeAnnotation

This project aims to create an easy way to annotate genomes.

Personally suggested that input transcripts from Hiast2+TransDecoder(cufflinks_gtf_to_alignment_gff3.pl) to PASA

###########update#################

V2.sh: More simplicity, faster, and no frameshift

RNA-based: Hisat2+TransDdecoder

Ab-initio: braker3

Homology: Complete structure from Miniprot

########### Post PASA ############

This (https://github.com/hrluo93/python4bio/blob/main/false-gene-model.py) script can check if the annotation contains false gene models.

###Soft-masked genome would result in TE contained in annotation. We used OrthoFinder to filter annotation results to retain orthologous genes and remove non-orthologous with 1 or 2 exons.

##Target species (Gene ID like NNYC0000010.1 )in Orthogroups.GeneCount.tsv $3 with reference species in $2 and $4

orthofinder -f orthof -og -M msa -t 12 -S blast_gz

cd orthof/*/Orthogroups/

cat Orthogroups.GeneCount.tsv | awk '{if ($2 > 0 || $4 >0) print}' | awk '{if ($3 > 0) print}' > nny.allortho.count.tsv

awk 'FNR==NR {a[$1]=$0;next} $1 in a {print a[$1],$0}' nny.allortho.count.tsv Orthogroups.tsv > nny.merge.tsv

grep -o nny.merge.tsv | cut -f1 -d "." > nny.orthogene.list

#nny.orthogene.list contained all orthologous genes that should kept. Non-orthologous with 1 or 2 exons can be found via TBTools GXF STAT or any other method you prefer.

######################################################

RNA-seq+homology+ab-initio based annotation.sh used in the great bustard genome.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GenomeAnnotation

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

GenomeAnnotation