Skip to content

Latest commit

 

History

History
51 lines (21 loc) · 1.72 KB

File metadata and controls

51 lines (21 loc) · 1.72 KB

GenomeAnnotation

This project aims to create an easy way to annotate genomes.

Personally suggested that input transcripts from Hiast2+TransDecoder(cufflinks_gtf_to_alignment_gff3.pl) to PASA

###########update#################

V2.sh: More simplicity, faster, and no frameshift

RNA-based: Hisat2+TransDdecoder

Ab-initio: braker3

Homology: Complete structure from Miniprot

########### Post PASA ############

This (https://github.com/hrluo93/python4bio/blob/main/false-gene-model.py) script can check if the annotation contains false gene models.

###Soft-masked genome would result in TE contained in annotation. We used OrthoFinder to filter annotation results to retain orthologous genes and remove non-orthologous with 1 or 2 exons.

##Target species (Gene ID like NNYC0000010.1 )in Orthogroups.GeneCount.tsv $3 with reference species in $2 and $4

orthofinder -f orthof -og -M msa -t 12 -S blast_gz

cd orthof/*/Orthogroups/

cat Orthogroups.GeneCount.tsv | awk '{if ($2 > 0 || $4 >0) print}' | awk '{if ($3 > 0) print}' > nny.allortho.count.tsv

awk 'FNR==NR {a[$1]=$0;next} $1 in a {print a[$1],$0}' nny.allortho.count.tsv Orthogroups.tsv > nny.merge.tsv

grep -o image nny.merge.tsv | cut -f1 -d "." > nny.orthogene.list

#nny.orthogene.list contained all orthologous genes that should kept. Non-orthologous with 1 or 2 exons can be found via TBTools GXF STAT or any other method you prefer.

######################################################

RNA-seq+homology+ab-initio based annotation.sh used in the great bustard genome. annotation1