**Project: Toxoplasma gondii infection in the Mus musculus Model, Immune response and differentially gene expression analysis**

This project is a part of the study conducted by Singhania et al, 2019, Transcriptional profiling unveils type I and II interferon networks in blood and tissues across diseases.

Toxoplasma gondii is a widespread intracellular protozoan parasite capable of infecting most warm-blooded animals, including humans and rodents. The infection primarily targets various tissues, including the lungs and blood, where it can cause systemic and localized effects. This parasite is notable for its complex life cycle, which is divided in sexual and asexual phases. This life cycle contains intermediary hosts such as birds or any mammals, except felines that are considered as definitive host of this parasite. The infection caused by Toxoplasma gondii is called the toxoplasmosis, which is typically asymptomatic in healthy individuals, but poses significant risks to immunocompromised persons or pregnant women. Additionaly, according to previous studies, it appears that this parasite can be linked to human neuropsychiatric disorders.

By integrating several bioinformatics tools, this project aims to identify the differential expression of immune-related genes in infected versus control tissues, focusing on tissue-specific immune responses.

The samples are RNA sequencing data of mice's blood and lungs that were infected or not with Toxoplasma gondii, as described on the table below.

Code	Condition	Origin of Sample	Code	Condition	Origin of Sample
SRR7821937	Control	Lung	SRR7821968	Control	Blood
SRR7821938	Control	Lung	SRR7821969	Control	Blood
SRR7821939	Control	Lung	SRR7821970	Control	Blood
SRR7821918	Infected	Lung	SRR7821949	Infected	Blood
SRR7821919	Infected	Lung	SRR7821950	Infected	Blood
SRR7821920	Infected	Lung	SRR7821951	Infected	Blood
SRR7821921	Infected	Lung	SRR7821952	Infected	Blood
SRR7821922	Infected	Lung	SRR7821953	Infected	Blood

First step: Quality control

The quality control script can be found on 01_QC_trim_script. The MultiQC script can be found on 02_multiQC_script. The tools used during this step are: FastQC, Trimmomatic and MultiQC. The trimming was not mandatory but can ensure the quality of the reads. Alternative can be used such as Fastp or CutAdapt.

Second step: Mapping

The corresponding script can be found on 03_mapping_script and passed through different steps such as building the index and mapping the reads to the reference genome, using the reference genome and the annotation file. The tools used during this step are: HISAT2, Samtools and optionally Integrative Genomics Viewer (IGV) to visualize the genome.

Third step: Count the number of reads per gene

The corresponding script can be found on 04_counts_script and used the same annotation file as for the mapping, the annotation file. The tool used during this step is featureCounts.

This is a key step because the table that you will obtained with featureCounts will be used for the R analysis, e.g., to create the DESeq2 object. For further analysis, you have to download the featureCounts table.

Final step: in R

Exploratory data analysis

The corresponding script can be found on 05_explanatory_analysis_R and passed through different steps such as the creation of a DESeq object and then the visualization of gene expression patterns using several tools. The clustering of the groups can be visualize with a Principal Component Analysis (PCA), a Principal Coordinates Analysis (PCoA) and a Heat Map. The tool used during this step is R with different packages that have to be downloaded before running the analysis: library(DESeq2), library(clusterProfiler), library(pheatmap), library(vegan) and library(ggplot2).

Differential expression and overrepresentation analysis

The corresponding script of the differential expression analysis can be found on 06_differential_expression_R and aims to compare which are the differentially expressed genes between the conditions based on tissues. The threshold is here based on 0.05 (5%) but you can use a lower threshold to have less significant genes. Based on these results, you can calculate the number of up- and down-regulated genes in the samples. Visualization can be done with Volcano Plots. The tool used during this step is R using the packages: library(biomaRt), library(org.Mm.eg.db) and library(EnhancedVolcano).

The outliers observed on the Volcano Plots were analysed with Boxplots to have a graphical representation of their distributions within the groups. You can simply run the analysis using any genes that are associated with its own ENSEMBL gene IDs that you can find on the website.

The corresponding script of the overrepresentation analysis can be found on 07_overrepresentation_R. The overrepresentation analysis provides Gene Ontology (GO) terms that can be visualize using dot plots and bar plots. Again, be careful of what options you specify in R. For the purpose of this project, I selected "BP - Biological Processes" as subontology but you can use another one. The tool used during this step is R using packages: library(clusterProfiler) and library(org.Mm.eg.db)

On this repository, you will also find:

the FastQC files
the MultiQC report
the featureCounts table

Name		Name	Last commit message	Last commit date
Latest commit History 40 Commits
01_QC_trim_script		01_QC_trim_script
02_multiQC_script		02_multiQC_script
03_mapping_script		03_mapping_script
04_counts_script		04_counts_script
05_explanatory_analysis_R		05_explanatory_analysis_R
06_differential_expression_R		06_differential_expression_R
07_overrepresentation_R		07_overrepresentation_R
README.md		README.md
SRR7821918_1trim_fastqc.html		SRR7821918_1trim_fastqc.html
SRR7821918_2trim_fastqc.html		SRR7821918_2trim_fastqc.html
SRR7821919_1trim_fastqc.html		SRR7821919_1trim_fastqc.html
SRR7821919_2trim_fastqc.html		SRR7821919_2trim_fastqc.html
SRR7821920_1trim_fastqc.html		SRR7821920_1trim_fastqc.html
SRR7821920_2trim_fastqc.html		SRR7821920_2trim_fastqc.html
SRR7821921_1trim_fastqc.html		SRR7821921_1trim_fastqc.html
SRR7821921_2trim_fastqc.html		SRR7821921_2trim_fastqc.html
SRR7821922_1trim_fastqc.html		SRR7821922_1trim_fastqc.html
SRR7821922_2trim_fastqc.html		SRR7821922_2trim_fastqc.html
SRR7821938_1trim_fastqc.html		SRR7821938_1trim_fastqc.html
SRR7821938_2trim_fastqc.html		SRR7821938_2trim_fastqc.html
SRR7821939_1trim_fastqc.html		SRR7821939_1trim_fastqc.html
SRR7821939_2trim_fastqc.html		SRR7821939_2trim_fastqc.html
SRR7821949_1trim_fastqc.html		SRR7821949_1trim_fastqc.html
SRR7821949_2trim_fastqc.html		SRR7821949_2trim_fastqc.html
SRR7821950_1trim_fastqc.html		SRR7821950_1trim_fastqc.html
SRR7821950_2trim_fastqc.html		SRR7821950_2trim_fastqc.html
SRR7821951_1trim_fastqc.html		SRR7821951_1trim_fastqc.html
SRR7821951_2trim_fastqc.html		SRR7821951_2trim_fastqc.html
SRR7821952_1trim_fastqc.html		SRR7821952_1trim_fastqc.html
SRR7821952_2trim_fastqc.html		SRR7821952_2trim_fastqc.html
SRR7821953_1trim_fastqc.html		SRR7821953_1trim_fastqc.html
SRR7821953_2trim_fastqc.html		SRR7821953_2trim_fastqc.html
SRR7821968_1trim_fastqc.html		SRR7821968_1trim_fastqc.html
SRR7821968_2trim_fastqc.html		SRR7821968_2trim_fastqc.html
SRR7821969_1trim_fastqc.html		SRR7821969_1trim_fastqc.html
SRR7821969_2trim_fastqc.html		SRR7821969_2trim_fastqc.html
SRR7821970_1trim_fastqc.html		SRR7821970_1trim_fastqc.html
SRR7821970_2trim_fastqc.html		SRR7821970_2trim_fastqc.html
multiqc_report.html		multiqc_report.html
reformatted_counts.txt		reformatted_counts.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

**Project: Toxoplasma gondii infection in the Mus musculus Model, Immune response and differentially gene expression analysis**

First step: Quality control

Second step: Mapping

Third step: Count the number of reads per gene

Final step: in R

Exploratory data analysis

Differential expression and overrepresentation analysis

On this repository, you will also find:

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Project: Toxoplasma gondii infection in the Mus musculus Model, Immune response and differentially gene expression analysis

First step: Quality control

Second step: Mapping

Third step: Count the number of reads per gene

Final step: in R

Exploratory data analysis

Differential expression and overrepresentation analysis

On this repository, you will also find:

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

**Project: Toxoplasma gondii infection in the Mus musculus Model, Immune response and differentially gene expression analysis**

Packages