Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,10 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
- [[PR #493](https://github.com/nf-core/chipseq/pull/493)] - Follow up to #487.
- [[#492](https://github.com/nf-core/chipseq/issues/492), [#417](https://github.com/nf-core/chipseq/issues/417)] - Refactor local modules to nf-core standard.
- [[#416](https://github.com/nf-core/chipseq/issues/416)] - Moved the KHMER_UNIQUEKMERS logic to prepare_genome
- [[#440](https://github.com/nf-core/chipseq/issues/440), [#510](https://github.com/nf-core/chipseq/issues/510)] - Fix
naming collisions when sample and replicate combination is identical for multiple antibodies see.
- [[#467](https://github.com/nf-core/chipseq/issues/467), [#510](https://github.com/nf-core/chipseq/issues/510)] -
Restrict the usage to one IP against one control replicate.

### Parameters

Expand Down
8 changes: 7 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -116,7 +116,13 @@ These scripts were originally written by Chuan Wang ([@chuan-wang](https://githu

The pipeline workflow diagram was designed by Sarah Guinchard ([@G-Sarah](https://github.com/G-Sarah)).

Many thanks to others who have helped out and contributed along the way too, including (but not limited to): [@apeltzer](https://github.com/apeltzer), [@bc2zb](https://github.com/bc2zb), [@bjlang](https://github.com/bjlang), [@crickbabs](https://github.com/crickbabs), [@drejom](https://github.com/drejom), [@houghtos](https://github.com/houghtos), [@KevinMenden](https://github.com/KevinMenden), [@mashehu](https://github.com/mashehu), [@pditommaso](https://github.com/pditommaso), [@Rotholandus](https://github.com/Rotholandus), [@sofiahaglund](https://github.com/sofiahaglund), [@tiagochst](https://github.com/tiagochst) and [@winni2k](https://github.com/winni2k).
Many thanks to others who have helped out and contributed along the way too, including (but not limited to):
[@apeltzer](https://github.com/apeltzer), [@bc2zb](https://github.com/bc2zb), [@bjlang](https://github.com/bjlang),
[@crickbabs](https://github.com/crickbabs), [@drejom](https://github.com/drejom),
[@houghtos](https://github.com/houghtos), [@KevinMenden](https://github.com/KevinMenden),
[@mashehu](https://github.com/mashehu), [@pditommaso](https://github.com/pditommaso),
[@Rotholandus](https://github.com/Rotholandus), [@sofiahaglund](https://github.com/sofiahaglund),
[@tiagochst](https://github.com/tiagochst), [@winni2k](https://github.com/winni2k) and [@Kevin-Brockers](https://github.com/Kevin-Brockers).

## Contributions and Support

Expand Down
20 changes: 20 additions & 0 deletions bin/check_samplesheet.py
Original file line number Diff line number Diff line change
Expand Up @@ -212,9 +212,14 @@ def check_samplesheet(file_in, file_out):
sample,
)

set_antibodies = set()
set_control_replicates = set()

for idx, val in enumerate(sample_mapping_dict[sample][replicate]):
control = "_REP".join(val[-1].split("_REP")[:-1])
control_replicate = val[-1].split("_REP")[-1]
set_control_replicates.add(control_replicate)

if control and (
control not in sample_mapping_dict.keys()
or int(control_replicate) not in sample_mapping_dict[control].keys()
Expand All @@ -225,6 +230,21 @@ def check_samplesheet(file_in, file_out):
val[-1],
)

for x in sample_mapping_dict[sample][replicate]:
set_antibodies.add(x[4])

# Check that a given sample replicate only uses one antibody
if len(set_antibodies) > 1:
print_error(
f"Sample: {sample}, replicate {replicate} has more than one antibody specified!"
)

# Check that a given sample-replicate have only one control replicate
if len(set_control_replicates) > 1:
print_error(
f"Sample: {sample}, replicate {replicate} has more than one control replicate specified! Revise the experimental design, see: 'Note on IP and control replicates'"
)

## Write to file
for idx in range(len(sample_mapping_dict[sample][replicate])):
fastq_files = sample_mapping_dict[sample][replicate][idx]
Expand Down
62 changes: 53 additions & 9 deletions docs/usage.md
Original file line number Diff line number Diff line change
Expand Up @@ -47,6 +47,50 @@ WT_INPUT,BLA203A30_S21_L002_R1_001.fastq.gz,,2,,,
WT_INPUT,BLA203A31_S21_L003_R1_001.fastq.gz,,3,,,
```

### Note on IP and control replicates - Comparisons of one IP sample against multiple controls

The pipeline is designed to handle one IP and matching control replicate, see section above. However there can be
situations where one might want to make multiple comparisons of the IP sample against several different controls. In
those cases it is advisable to encode these comparisons either in the sample column or as another replicate. Since it is
rather unusual in ChIP-Seq experiments, this feature is considered experimental. Please open a github issue in case you
need further assistance.

- Encoding in sample names:

```csv title="samplesheet.csv"
sample,fastq_1,fastq_2,replicate,antibody,control,control_replicate
WT_BCATENIN_IP,BLA203A1_S27_L006_R1_001.fastq.gz,,1,BCATENIN,WT_INPUT,1
WT_BCATENIN_IP_CONTROL_2,BLA203A1_S27_L006_R1_001.fastq.gz,,1,BCATENIN,WT_INPUT,2
WT_BCATENIN_IP_CONTROL_3,BLA203A1_S27_L006_R1_001.fastq.gz,,1,BCATENIN,WT_INPUT,3
WT_INPUT,BLA203A6_S32_L006_R1_001.fastq.gz,,1,,,
WT_INPUT,BLA203A30_S21_L001_R1_001.fastq.gz,,2,,,
WT_INPUT,BLA203A31_S21_L003_R1_001.fastq.gz,,3,,,
```

- Encoding as new biological replicates:

```csv title="samplesheet.csv"
sample,fastq_1,fastq_2,replicate,antibody,control,control_replicate
WT_BCATENIN_IP,BLA203A1_S27_L006_R1_001.fastq.gz,,1,BCATENIN,WT_INPUT,1
WT_BCATENIN_IP,BLA203A1_S27_L006_R1_001.fastq.gz,,2,BCATENIN,WT_INPUT,2
WT_BCATENIN_IP,BLA203A1_S27_L006_R1_001.fastq.gz,,3,BCATENIN,WT_INPUT,3
WT_INPUT,BLA203A6_S32_L006_R1_001.fastq.gz,,1,,,
WT_INPUT,BLA203A30_S21_L001_R1_001.fastq.gz,,2,,,
WT_INPUT,BLA203A31_S21_L003_R1_001.fastq.gz,,3,,,
```

- The following design, one IP replicate against more than one control replicate, is not allowed:

```csv title="samplesheet.csv"
sample,fastq_1,fastq_2,replicate,antibody,control,control_replicate
WT_BCATENIN_IP,BLA203A1_S27_L006_R1_001.fastq.gz,,1,BCATENIN,WT_INPUT,1
WT_BCATENIN_IP,BLA203A1_S27_L006_R1_001.fastq.gz,,1,BCATENIN,WT_INPUT,2
WT_BCATENIN_IP,BLA203A1_S27_L006_R1_001.fastq.gz,,1,BCATENIN,WT_INPUT,3
WT_INPUT,BLA203A6_S32_L006_R1_001.fastq.gz,,1,,,
WT_INPUT,BLA203A30_S21_L001_R1_001.fastq.gz,,2,,,
WT_INPUT,BLA203A31_S21_L003_R1_001.fastq.gz,,3,,,
```

### Full design

The pipeline will auto-detect whether a sample is single- or paired-end using the information provided in the samplesheet. The samplesheet can have as many columns as you desire, however, there is a strict requirement for the first 7 columns to match those defined in the table below.
Expand Down Expand Up @@ -77,15 +121,15 @@ NAIVE_INPUT,BLA203A48_S39_L001_R1_001.fastq.gz,,2,,,
NAIVE_INPUT,BLA203A49_S1_L006_R1_001.fastq.gz,,3,,,
```

| Column | Description |
| ------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `sample` | Custom sample name. This entry will be identical for multiple sequencing libraries/runs from the same sample. Spaces in sample names are automatically converted to underscores (`_`). |
| `fastq_1` | Full path to FastQ file for Illumina short reads 1. File has to be gzipped and have the extension ".fastq.gz" or ".fq.gz". |
| `fastq_2` | Full path to FastQ file for Illumina short reads 2. File has to be gzipped and have the extension ".fastq.gz" or ".fq.gz". |
| `replicate` | Integer representing replicate number. This will be identical for re-sequenced libraries. Must start from `1..<number of replicates>`. |
| `antibody` | Antibody name. This is required to segregate downstream analysis for different antibodies. Required when `control` is specified. |
| `control` | Sample name for control sample. |
| `control_replicate` | Integer representing replicate number for control sample. |
| Column | Description |
| ------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `sample` | Custom sample name. This entry will be identical for multiple sequencing libraries/runs from the same sample. Spaces in sample names are automatically converted to underscores (`_`). It should be unique per sample and contain sufficient informations, such as the antibody name. E.g: `{Treatment or cell type}_{antibody}_IP` -> `{WT/NAIVE}_{BCATENIN}_IP` |
| `fastq_1` | Full path to FastQ file for Illumina short reads 1. File has to be gzipped and have the extension ".fastq.gz" or ".fq.gz". |
| `fastq_2` | Full path to FastQ file for Illumina short reads 2. File has to be gzipped and have the extension ".fastq.gz" or ".fq.gz". |
| `replicate` | Integer representing replicate number. This will be identical for re-sequenced libraries. Must start from `1..<number of replicates>`. |
| `antibody` | Antibody name. This is required to segregate downstream analysis for different antibodies. Required when `control` is specified. |
| `control` | Sample name for control sample. |
| `control_replicate` | Integer representing replicate number for control sample. |

Example design files have been provided with the pipeline for [paired-end](../assets/samplesheet_pe.csv) and [single-end](../assets/samplesheet_se.csv) data.

Expand Down
Loading