Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
322 changes: 242 additions & 80 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,84 +1,141 @@
<img src="docs/source/inmoose.png" width="600">
<div align="center">
<img src="docs/source/inmoose.png" width="600" alt="InMoose Logo">

<h1>InMoose</h1>
<p><strong>In</strong>tegrated <strong>M</strong>ulti-<strong>O</strong>mic <strong>O</strong>pen <strong>S</strong>ource <strong>E</strong>nvironment</p>

<p>
<a href="https://pypi.org/project/inmoose"><img src="https://img.shields.io/pypi/v/inmoose" alt="PyPI version"></a>
<a href="https://pepy.tech/project/inmoose"><img src="https://static.pepy.tech/badge/inmoose" alt="PyPI Downloads"></a>
<a href="https://pepy.tech/projects/inmoose"><img src="https://static.pepy.tech/badge/inmoose/month" alt="Monthly Downloads"></a>
<a href="https://coveralls.io/github/epigenelabs/inmoose"><img src="https://img.shields.io/coverallsCoverage/github/epigenelabs/inmoose.svg" alt="Coverage"></a>
<a href="https://inmoose.readthedocs.io/en/latest/?badge=latest"><img src="https://readthedocs.org/projects/inmoose/badge/?version=latest" alt="Documentation Status"></a>
<a href="LICENSE"><img src="https://img.shields.io/pypi/l/inmoose" alt="License"></a>
</p>

<p>A comprehensive collection of tools for the analysis of omic data</p>

<p>Developed and maintained by <img src="docs/source/epigenelogo.png" width="20"> <a href="https://www.epigenelabs.com/">Epigene Labs</a></p>
</div>

---

## πŸ“‹ Table of Contents

- [πŸš€ Quick Start](#-quick-start)
- [✨ Features](#-features)
- [πŸ“– Documentation](#-documentation)
- [🧬 Batch Effect Correction](#-batch-effect-correction)
- [πŸ” Cohort Quality Control](#-cohort-quality-control)
- [πŸ“Š Differential Expression Analysis](#-differential-expression-analysis)
- [🎯 Consensus Clustering](#-consensus-clustering)
- [πŸ“š Citation](#-citation)
- [🀝 Contributing](#-contributing)

---

## πŸš€ Quick Start

### Installation

Install InMoose directly from PyPI:

```bash
pip install inmoose
```

### Basic Usage

```python
from inmoose.pycombat import pycombat_norm, pycombat_seq

[![pypi version](https://img.shields.io/pypi/v/inmoose)](https://pypi.org/project/inmoose)
[![PyPI Downloads](https://static.pepy.tech/badge/inmoose)](https://pepy.tech/project/inmoose)
[![PyPI Downloads](https://static.pepy.tech/badge/inmoose/month)](https://pepy.tech/projects/inmoose)
[![coverage](https://img.shields.io/coverallsCoverage/github/epigenelabs/inmoose.svg)](https://coveralls.io/github/epigenelabs/inmoose)
[![Documentation Status](https://readthedocs.org/projects/inmoose/badge/?version=latest)](https://inmoose.readthedocs.io/en/latest/?badge=latest)
[![license](https://img.shields.io/pypi/l/inmoose)](LICENSE)
# Batch effect correction
microarray_corrected = pycombat_norm(microarray_data, microarray_batches)
rnaseq_corrected = pycombat_seq(rnaseq_data, rnaseq_batches)
```

# InMoose
---

InMoose is the **In**tegrated **M**ulti **O**mic **O**pen **S**ource **E**nvironment.
It is a collection of tools for the analysis of omic data.
## ✨ Features

InMoose is developed and maintained by <img src="docs/source/epigenelogo.png" width="20"> [Epigene Labs](https://www.epigenelabs.com/).
InMoose provides a comprehensive suite of tools for multi-omic data analysis:

# Installation
| Feature | Description | Supported Data Types |
|---------|-------------|---------------------|
| **Batch Effect Correction** | Remove technical biases from your data | Microarray, RNA-seq |
| **Quality Control** | Comprehensive QC with HTML reports | All expression data |
| **Differential Expression** | Find differentially expressed genes | Microarray, RNA-seq |
| **Consensus Clustering** | Robust clustering with confidence metrics | Any numeric data |

You can install InMoose directly with:
---

```
pip install inmoose
```
## πŸ“– Documentation

# Documentation
Comprehensive documentation is available on [ReadTheDocs](https://inmoose.readthedocs.io/en/latest/).

Documentation is hosted on [readthedocs.org](https://inmoose.readthedocs.io/en/latest/).
---

# Citing
## 🧬 Batch Effect Correction

Depending on the features you use, you may cite one of the following papers:
- Behdenna A, Colange M, Haziza J, Gema A, AppΓ© G, Azencot CA and Nordor A. (2023) pyComBat, a Python tool for batch effects correction in high-throughput molecular data using empirical Bayes methods. BMC Bioinformatics 7;24(1):459. https://doi.org/10.1186/s12859-023-05578-5.
- Colange M, AppΓ© G, Meunier L, Weill S, Nordor A, Behdenna A. (2024)
Differential Expression Analysis with InMoose, the Integrated Multi-Omic Open-Source Environment in Python. BioRxiv. https://doi.org/XXX
InMoose provides state-of-the-art batch effect correction methods for transcriptomic data:

# Batch Effect Correction
### πŸ”¬ Microarray Data
- **ComBat**: Python3 implementation superseding [pyCombat](https://github.com/epigenelabs/pycombat/)
- **Method**: Empirical Bayes approach for batch effect removal
- **Reference**: [Johnson et al., 2007](https://doi.org/10.1093/biostatistics/kxj037)

InMoose provides features to correct technical biases, also called batch
effects, in transcriptomic data:
- for microarray data, InMoose supersedes
[pyCombat](https://github.com/epigenelabs/pycombat/), a Python3 implementation
of [ComBat](https://doi.org/10.1093/biostatistics/kxj037), one of the most
widely used tool for batch effect correction on microarray data.
- for RNASeq data, InMoose features a port to Python3 of
[ComBat-Seq](https://doi.org/10.1093/nargab/lqaa078), one of the most widely
used tool for batch effect correction on RNASeq data.
### 🧬 RNA-seq Data
- **ComBat-Seq**: Python3 port optimized for count data
- **Method**: Negative binomial regression for batch correction
- **Reference**: [Zhang et al., 2020](https://doi.org/10.1093/nargab/lqaa078)

### Usage Example

To use these functions, simply import them and call them with default
parameters:
```python
from inmoose.pycombat import pycombat_norm, pycombat_seq

microarray_corrected = pycombat_norm(microarray_data, microarray_batches)
rnaseq_corrected = pycombat_seq(rnaseq_data, rnaseq_batches)
# Correct microarray data
microarray_corrected = pycombat_norm(
data=microarray_data, # Expression matrix (genes Γ— samples)
batch=microarray_batches # Batch labels for each sample
)

# Correct RNA-seq data
rnaseq_corrected = pycombat_seq(
data=rnaseq_data, # Count matrix (genes Γ— samples)
batch=rnaseq_batches # Batch labels for each sample
)
```

* `microarray_data`, `rnaseq_data`: the expression matrices, containing the
information about the gene expression (rows) for each sample (columns).
* `microarray_batches`, `rnaseq_batches`: list of batch indices, describing the
batch for each sample. The list of batches should contain as many elements as
the number of samples in the expression matrix.
### Parameters

- **`data`**: Expression/count matrix with genes as rows and samples as columns
- **`batch`**: List of batch identifiers (same length as number of samples)

# Cohort QC
InMoose provides classes `CohortMetric` and `QCReport` to help to perform quality control (QC) on cohort datasets after batch effect correction.
---

`CohortMetric`: This class handles the analysis and provides methods for performing quality control on cohort datasets.
## πŸ” Cohort Quality Control

**Description**
The `CohortMetric` class performs a range of quality control analyses, including:
- Principal Component Analysis (PCA) to assess data variation.
- Comparison of sample distributions across different datasets or batches.
- Quantification of the effect of batch correction.
- Silhouette Score calculation to assess how well batches are separated.
- Entropy calculation to evaluate the mixing of samples from different batches.
Perform comprehensive quality control analysis on your cohort datasets with automated reporting.

### CohortMetric Class

The `CohortMetric` class provides methods for quality control analysis:

#### Features
- πŸ“Š **Principal Component Analysis (PCA)** - Assess data variation patterns
- πŸ“ˆ **Sample Distribution Comparison** - Compare distributions across batches
- 🎯 **Batch Correction Quantification** - Measure correction effectiveness
- πŸ“ **Silhouette Score Analysis** - Evaluate batch separation
- πŸ”€ **Entropy Calculation** - Assess sample mixing quality

#### Usage Example

**Usage Example**
```python
from inmoose.cohort_qc.cohort_metric import CohortMetric

cohort_quality_control = CohortMetric(
# Initialize quality control analysis
cohort_qc = CohortMetric(
clinical_df=clinical_data,
batch_column="batch",
data_expression_df=gene_expression_after_correction,
Expand All @@ -87,50 +144,155 @@ cohort_quality_control = CohortMetric(
)
```

`QCReport`: This class takes a CohortMetric argument, and generates an HTML report summarizing the QC results.
### QCReport Class

**Description**
The `QCReport` class extends `CohortMetric` and generates a comprehensive HTML report based on the quality control analysis. It includes visualizations and summaries of PCA, batch correction, Silhouette Scores, entropy, and more.
Generate comprehensive HTML reports with visualizations:

**Usage Example**
```python
from inmoose.cohort_qc.qc_report import QCReport

# Generate and save the QC report
qc_report = QCReport(cohort_quality_control)
# Generate interactive HTML report
qc_report = QCReport(cohort_qc)
qc_report.save_html_report_local(output_path='reports')
```

# Differential Expression Analysis
The report includes:
- πŸ“Š Interactive PCA plots
- πŸ“ˆ Before/after batch correction comparisons
- πŸ“‹ Summary statistics and metrics
- 🎨 Professional visualizations

---

## πŸ“Š Differential Expression Analysis

InMoose provides features to analyse diffentially expressed genes in bulk
transcriptomic data:
- for microarray data, InMoose features a port of
[limma](https://doi.org/10.1093/nar/gkv007), the *de facto* standard tool
for differential expression analysis on microarray data.
- for RNASeq data, InMoose features a ports to Python3 of
[edgeR](https://doi.org/10.12688/f1000research.8987.2) and
[DESeq2](https://doi.org/10.1186/s13059-014-0550-8), two of the most widely
used tools for differential expression analysis on RNASeq data.
Identify differentially expressed genes using gold-standard statistical methods.

See the dedicated sections of the
[documentation](https://inmoose.readthedocs.io/en/latest/).
### πŸ”¬ Microarray Analysis
- **limma**: The *de facto* standard for microarray differential expression
- **Reference**: [Ritchie et al., 2015](https://doi.org/10.1093/nar/gkv007)

# Consensus clustering
InMoose provides features to compute consensus clustering, a resampling based algorithm compatible with any clustering algorithms which class implementation is instantiated with parameter `n_clusters`, and possess a `fit_predict` method, which is invoked on data.
Consensus clustering helps determining the best number of clusters to use and output confidence metrics and plots.
### 🧬 RNA-seq Analysis

#### edgeR
- **Method**: Exact tests for negative binomial models
- **Best for**: Simple experimental designs, small sample sizes
- **Reference**: [Robinson et al., 2010](https://doi.org/10.12688/f1000research.8987.2)

#### DESeq2
- **Method**: Wald tests with empirical Bayes shrinkage
- **Best for**: Complex designs, larger sample sizes
- **Reference**: [Love et al., 2014](https://doi.org/10.1186/s13059-014-0550-8)

### Getting Started

Detailed usage examples and tutorials are available in the [documentation](https://inmoose.readthedocs.io/en/latest/).

---

## 🎯 Consensus Clustering

Determine optimal cluster numbers with confidence using resampling-based consensus clustering.

### Features
- πŸ”„ **Resampling-based approach** for robust clustering
- πŸ“Š **Automatic optimal cluster detection**
- πŸ“ˆ **Confidence metrics and stability analysis**
- πŸ”§ **Compatible with any scikit-learn clustering algorithm**

### Requirements
Your clustering algorithm must:
- Be instantiated with `n_clusters` parameter
- Have a `fit_predict` method

### Usage Example

To use these functions, import the consensusClustering class and a clustering algorithm class:
```python
from inmoose.consensus_clustering.consensus_clustering import consensusClustering
from sklearn.cluster import AgglomerativeClustering

CC = consensusClustering(AgglomerativeClustering)
CC.compute_consensus_clustering(numpy_ndarray)
# Initialize consensus clustering
cc = consensusClustering(
cluster=AgglomerativeClustering,
min_k=2,
max_k=10,
nb_resampling_iteration=50
)

# Perform analysis
cc.compute_consensus_clustering(data_matrix)

# Get optimal number of clusters
optimal_k = cc.bestK
print(f"Optimal number of clusters: {optimal_k}")
```

# How to contribute
### Outputs
- **Consensus matrices** for each cluster number
- **Stability metrics** and confidence scores
- **Visualization plots** for cluster assessment
- **Optimal cluster recommendation**

---

## πŸ“š Citation

When using InMoose, please cite the appropriate papers based on the features you use:

### Batch Effect Correction
```bibtex
@article{behdenna2023pycombat,
title={pyComBat, a Python tool for batch effects correction in high-throughput molecular data using empirical Bayes methods},
author={Behdenna, A and Colange, M and Haziza, J and Gema, A and AppΓ©, G and Azencot, CA and Nordor, A},
journal={BMC Bioinformatics},
volume={24},
number={1},
pages={459},
year={2023},
doi={10.1186/s12859-023-05578-5}
}
```

### Differential Expression Analysis
```bibtex
@article{colange2024differential,
title={Differential Expression Analysis with InMoose, the Integrated Multi-Omic Open-Source Environment in Python},
author={Colange, M and AppΓ©, G and Meunier, L and Weill, S and Nordor, A and Behdenna, A},
journal={BioRxiv},
year={2024},
doi={10.1101/XXX}
}
```

---

## 🀝 Contributing

We welcome contributions! Please see our [Contributing Guidelines](https://github.com/epigenelabs/inmoose/blob/master/CONTRIBUTING.md) for details on:

- πŸ› **Bug Reports** - Help us improve by reporting issues
- ✨ **Feature Requests** - Suggest new functionality
- πŸ”§ **Code Contributions** - Submit pull requests
- πŸ“– **Documentation** - Improve our docs and examples

### Development Setup

```bash
# Clone the repository
git clone https://github.com/epigenelabs/inmoose-Hack.git
cd inmoose-Hack

# Install in development mode
pip install -e .
```

Please refer to [CONTRIBUTING.md](https://github.com/epigenelabs/inmoose/blob/master/CONTRIBUTING.md) to learn more about the contribution guidelines.
---

<div align="center">
<p>Made with ❀️ by <a href="https://www.epigenelabs.com/">Epigene Labs</a></p>
<p>
<a href="https://github.com/epigenelabs/inmoose-Hack">🌟 Star us on GitHub</a> |
<a href="https://inmoose.readthedocs.io/en/latest/">πŸ“– Documentation</a> |
<a href="https://pypi.org/project/inmoose/">πŸ“¦ PyPI Package</a>
</p>
</div>