In this repository, you will find all the code to reproduce my analysis based on the genome sequencing of Arabidopsis Thaliana from Qichao Lian et al (2024) (only Kyr-1 genome have been taken) and on the RNA sequencing of Arabidopsis Thaliana from Jiao WB, Schneeberger K. (2020) (only Sha transcriptome have been taken). 01_read_download.sh
First use fastQC to assess the quality of the Kyr-1 and Sha sequence: 02_fastqc_read.sh to create the fastqc reports, and then use Fastp to control the quality of the sequences and trim to increase quality, done in 03_fastp_clean.sh. To have the genome size estimation and quality control of sequencing data for Kyr-1, k-mer are found with jellyfish 04_kmer_count.sh.
To assembly the sequence from Kyr-1, Flye, Hifiasm and LJA will be used respectively in this script: 05_flye_assmbly.sh, 06_hifiasm_assembly.sh and 07_LJA_assembly.sh. For the Sha sequence, Trinity will assembly the RNA sequence in 08_trinity_assembly.sh
The evaluation of the assembly will be done with Busco for Flye, Hifiasm, LJA assembly 09_busco_evaluation and Trinity assembly 09_1_busco_evaluation, And with Quast and merqury the evaluation will be done only for the Flye, Hifiasm and LJA assembly respectively in the following script: 10_quast_evaluation.sh, 11_1_premerqu_secTry.sh and 11_merqury_secTry.sh
This script 12_nucmer_comparaison.sh will compare the assembled genomes from flye, hifiasm and LJA against the Arabidopsis thaliana reference and against each other.