TACTIC is an incremental, privacy-aware, and user-friendly tool designed for 16S rRNA amplicon sequence analysis, offering a comprehensive and versatile solution to address the challenges of microbial community profiling. It integrates four distinct pipelines—OTU (legacy clustering), zOTU (denoised amplicons), and the novel hybrid approaches, Taxonomy Agnostic Clustering (TAC) and Taxonomy Informed Clustering (TIC)—to provide researchers with flexible options, including conventional methods and advanced strategies that mitigate diversity inflation inherent to denoising. The platform is accessible via a containerized command-line interface, a stand-alone installable software with a graphical user interface, and a browser-based web tool, ensuring accessibility, reproducibility, and scalability for users of all computing environments and skill levels, while its results are compatible with downstream analysis tools like Namco and Rhea.
You can run the TACTIC pipeline either by cloning the repository and running the run_tactic.py script or via docker image.
You can clone this repository as below:
git clone https://github.com/MPourjam/TACTIC-Pipeline.gitor you can alternatively download the repository and unzip it.
Then you need to create a virtual environment and install the required packages in the environment:
cd TACTIC-Pipeline
python3 -m venv tactic_env
source tactic_env/bin/activate
pip install -r requirements.txtAs the binary files are for linux and CPU architecture of x86, this method only works on linux operating system and CPU architecture of x86.
There is a docker image for each release available that you can pull and start using. For example:
docker pull ghcr.io/mpourjam/tactic-pipeline:0.7.2or you can download the exported docker image (image.tar.gz) of each version in the releases page. Then import it to your local docker image repository.
docker load -i image.tar.gz
# or
docker import image.tar.gzThe pipeline is given as single python script (run_tactic.py) which you can run by python3 </path/to/run_tactic.py>. In order to get the help text on the arguments run python3 </path/to/run_tactic.py> --help. Below you find the the CLI arguments output from the help of the pipeline.
usage: run_tactic.py [-h] [-i INPUT_DIRECTORY] [-d FASTQ_DIRECTORY]
[-am {de-novo,TAC,TIC,ZOTU}] [-y YML_FILE]
[-map MAPPING_FILE] [-stat SPIKE_STAT]
[-ut USEARCH_BIN] [-db DB_DIRECTORY]
[-spk-ref SPIKES_REFERENCES_DIR] [-sp] [-sa] [-fp]
[-tf] [-iz] [-snbf] [-t THREADS]
options:
-h, --help show this help message and exit
-i INPUT_DIRECTORY, --input-directory INPUT_DIRECTORY
Base directory for all pipeline inputs and outputs;
every needed file and directory should be findable
relative to this directory.
-d FASTQ_DIRECTORY, --fastq-directory FASTQ_DIRECTORY
Directory containing FASTQ files to be processed;
should be relative to <--input-directory>. Default
is <--input-directory>.
-am {de-novo,TAC,TIC,ZOTU}, --analysis-mode {de-novo,TAC,TIC,ZOTU}
Choose an analysis mode. Default is 'TIC'.
-y YML_FILE, --yml-file YML_FILE
Path to pipeline configuration file in YAML format.
This file defines the arguments for the pipeline and
should be relative to <--input-directory>.
-map MAPPING_FILE, --mapping-file MAPPING_FILE
-stat SPIKE_STAT, --spike-stat SPIKE_STAT
The path to a mapping file defining spike count,
sample weight and spike amount for each sample.
#SampleID SpikeReads spikes_total_weight_in_g
spike_amount parent_path. Relative to
<--input-directory>
-ut USEARCH_BIN, --usearch-bin USEARCH_BIN
Path to binary of usearch version 11. Default is
usearch11.0.667_i86linux64.
-db DB_DIRECTORY, --db-directory DB_DIRECTORY
Path to directory containing required databases
(e.g., SILVA, SortMeRNA).
-spk-ref SPIKES_REFERENCES_DIR, --spikes-references-dir SPIKES_REFERENCES_DIR
Directory containing FASTA files of spike-in
references (required for spike removal). Should be
relative to <--input-directory>. Default is
{SPIKESIDX}.
-sp, --skip-preprocess
Should skip preprocessing step
-sa, --skip-analysis Should skip analysis step
-fp, --force-preprocess
Force preprocessing samples. It invalidates
--skip-preprocess argument.
-tf, --place-template-files
Writes the default argument yaml template file
(TACTICPipeline_args.yml) and mapping template file
(mapping_file.csv) to <--input-directory>, print
help text and exits.
-iz, --individual-zotus
If set, the preprocessing step will produce
individual ZOTUs tables for each sample (slower,
more detailed output). If not set, only a combined
table is produced, which speeds up the pipeline.
-snbf, --skip-non-bacterial-filter
Skip filtering of non-bacterial 16S sequences.
-t THREADS, --threads THREADS
Number of threads to use for parallel processing
(default: 16). Recommended range: 1 to 24.Anaylsis arguments are defined in a YAML file named TACTICPipeline_args.yml. You can modify the arguments to your desired values in the file and give the file to the pipeline to have the arguments taken into account.
These arguemtns affects the pre-processing step. Each (pair) of fastq file goes into pre-processing step. In this pre-processing step:
- Spike removal
- Quality check
- Reads merge (if paired)
- Trimming
- Filtering
- Dereplication
If -iz / --individual-zotus flag is set to true then:
- ZOTUs clustering
- Filtering for bacterial/archaeal 16S rRNA reads (if opted for by giving `` flag to the pipeline)
- ZOTU table creation
- Taxonomy assignment
- Krona graph generation
The arguments below are set to most lenient argument to merge as many as possible reads. Later in the filtering step the merged reads with bad quality get filtered out.
- fastq_maxdiffs: Maximum number of mismatches allowed in the alignment of overlapping region (default: 50)
- fastq_pctid: Minimum percentage identity for merging paired reads (default: 50)
- fastq_minmergelen: Minimum length of merged reads in base pairs (default: 200)
- fastq_maxmergelen: Maximum length of merged reads in base pairs (default: 600)
- stripleft: Number of bases to remove from the left side, corresponding to 341F primer length (default: 17)
- stripright: Number of bases to remove from the right side, corresponding to 785R primer length (default: 21)
Reference: www.probebase.csb.univie.ac.at
- stripleft: Number of bases to remove from the left side, corresponding to 341F primer length (default: 17)
Reference: www.probebase.csb.univie.ac.at
- fastq_maxee_rate: Discard reads with > E expected errors per base.Calculated after any truncation options have been applied. For example, with the fastq_maxee_rate option set to 0.01, then a read of length 100 will be discarded if the expected errors is >1, and a read of length 1,000 will be discarded if the expected errors is >10. (default: 0.002)
- fastq_truncqual: Quality score threshold for truncating reads; truncation occurs at first position with Q score ≤ N (default: 10)
fastq_truncqual is deprecated in versions >=0.7.3 as it could cause drop of some merged reads due to short length after truncation.
Reference: USEARCH manual for read filtering
- fastq_maxee_rate: Discard reads with > E expected errors per base.Calculated after any truncation options have been applied. For example, with the fastq_maxee_rate option set to 0.01, then a read of length 100 will be discarded if the expected errors is >1, and a read of length 1,000 will be discarded if the expected errors is >10. (default: 0.002)
- fastq_truncqual: Quality score threshold for truncating reads; truncation occurs at first position with Q score ≤ N (default: 10)
fastq_truncqual is deprecated in versions >=0.7.3 as it could cause drop of some merged reads due to short length after truncation.
Reference: USEARCH manual for read filtering
- minsize: Minimum abundance threshold for merged sequences clustering; sequences with fewer reads are filtered out and not taken to the clustering step (default: 2)
Reference: USEARCH manual for UNOISE algorithm (ZOTU clustering)
- e: E-value threshold for filtering 16S sequences (default: 0.1)
- num_alignments: Number of alignments to report per query (default: 1)
- match_id: Minimum sequence identity threshold for ZOTU matching; higher values result in more refined ZOTUs and lower total count of ZOTUs (default: 0.99)
- turn: Specifies which sequences to add taxonomy information to (default: all)
- minsize: Minimum abundance threshold; amplicons with fewer reads are filtered out (default: 2)
- abund_limit: Abundance limit threshold for filtering low-abundance sequences. ZOTUs with relative abundance below this threshold in all samples will be discarded (default: 0.000)
- sample_wise_correction: Enable or disable sample-wise abundance correction (default: false)
- match_id: Minimum sequence identity threshold for ZOTU clustering (default: 0.99)
Reference: USEARCH manual for UNOISE algorithm (ZOTU clustering)
- minsize: Minimum abundance threshold for de novo OTU clustering (default: 8)
- abund_limit: Abundance limit threshold for filtering low-abundance sequences. ZOTUs with relative abundance below this threshold in all samples will be discarded (default: 0.0025)
- sample_wise_correction: Enable or disable sample-wise abundance correction (default: false)
- family_sim: Sequence similarity threshold for family-level taxonomic clustering (default: 0.90)
- genus_sim: Sequence similarity threshold for genus-level taxonomic clustering (default: 0.95)
- species_sim: Sequence similarity threshold for species-level taxonomic clustering (default: 0.987)
- abund_limit: Abundance limit threshold for filtering low-abundance sequences. ZOTUs with relative abundance below this threshold in all samples will be discarded. (not yet implemented) (default: 0.0025)
- sample_wise_correction: Enable or disable sample-wise abundance correction. Abudnance values below this threshold will be rounded to 0 in the S/ZOTUs tables (not yet implemented) (default: false)
References:
- abund_limit: Abundance limit threshold (not yet implemented) (default: 0.0025)
- sample_wise_correction: Enable or disable sample-wise abundance correction (not yet implemented) (default: false)
- cluster_thr: Clustering threshold for sequence similarity (default: 0.987)
You can run the pipeline either by directly running the script (run_tactic.py) or the docker image.
You can run the pipeline quickly with fastq files auto-discovery and default pipeline arguments. In this mode the pipeline will automatically discover the fastq files (and pair them), creates a mapping file and runs the pipeline with default arguments.
Running via Script:
cd TACTIC-Pipeline
python run_tactic.py --input-directory /path/to/your/project/directoryRunning via Docker:
docker run --rm -v "/path/to/your/project/directory:/base/inputs" ghcr.io/mpourjam/tactic-pipeline:0.7.3 With
-vargument we are mounting our desired local path to a the path/base/inputs(default--input-directory) in which the pipeline by default look for fastq files. Any fastq files (also nested inside other sub directories) will be visible to the pipeline when only setting--input-directory(i.e.,/base/inputs/). In case we want to mount an upper level directory as our--input-directoryand narrow the pipeline's search and analysis scope we can set--fastq-directoryargument which should be set relative to--input-directory. This way the pipeline searches for fastq files within the path given via-fastq-dir. Sample mapping file and arguments YAML file could be still relative to--input-directory.
For example we can narrow the pipeline to a sub-directory within the --input-directory like below:
docker run --rm -v "/path/to/your/project/directory:/base/inputs" ghcr.io/mpourjam/tactic-pipeline:0.7.3 --fastq-directory Sequencing_Run_1/or
python run_tactic.py --input-directory /path/to/your/project/directory --fastq-directory Sequencing_Run_1/In the example above the content of /path/to/your/project/directory which is (/base/inputs in the docker container) looks like:
.
├── Sequencing_Run_1
│ └── fastqs
│ ├── 001-654_Stool_R1_001.fastq.gz
│ ├── 001-654_Stool_R2_001.fastq.gz
│ ├── 002-342_Stool_R1_001.fastq.gz
│ └── 002-342_Stool_R2_001.fastq.gz
└── Sequencing_Run_2
└── fastqs
├── 001-435_Saliva_R1_001.fastq.gz
├── 001-435_Saliva_R2_001.fastq.gz
├── 002-436_Saliva_R1_001.fastq.gz
└── 002-436_Saliva_R2_001.fastq.gzAll fastq files, in this example,, in the directory Sequencing_Run_1 will be processed with the default argument.
The pipeline reads the arguments from a YAML file it expects as input to -y or --yml-file. To have a template of such YAML file and fill in your desired arguments you can run the pipeline as described in Quickstart and with the switch -tf or --place-template-files. This will put template files of the argument YAML file and the mapping file (described in the next section) in your project directory.
Running via Script:
cd TACTIC-Pipeline
python run_tactic.py --input-dir /path/to/your/project/directory --place-template-files
# or
python run_tactic.py --input-dir /path/to/your/project/directory -yRunning via Docker:
docker run --rm -v "/path/to/your/project/directory:/base/inputs" ghcr.io/mpourjam/tactic-pipeline:0.7.3 --place-template-files
# or
docker run --rm -v "/path/to/your/project/directory:/base/inputs" ghcr.io/mpourjam/tactic-pipeline:0.7.3 -tfThese commands will put two files in your project directory:
mapping_file_TEMPLATE.csv: This is the template file to give samples in a mapping file with their sample weight and spike amount for spike normalization.TACTICPipeline_args_TEMPLATE.yml: This is a YAML file containing the arguments to the pipeline.
Now that we want to tweak arguments of the analysis, we can directly open the TACTICPipeline_args_TEMPLATE.yml and change the desired argument. After the change is applied, we put the argument file in a path visible to the pipeline (i.e., any sub-path to --input-directory or /path/to/your/project/directory when using docker image). As an example, the commands below will run the pipeline on two different batches of fastq files with two different argument set.
My project directory now looks like:
├── Sequencing_Run_1
│ └── fastqs
│ ├── 001-654_Stool_R1_001.fastq.gz
│ ├── 001-654_Stool_R2_001.fastq.gz
│ ├── 002-342_Stool_R1_001.fastq.gz
│ └── 002-342_Stool_R2_001.fastq.gz
├── Sequencing_Run_2
│ └── fastqs
│ ├── 001-435_Saliva_R1_001.fastq.gz
│ ├── 001-435_Saliva_R2_001.fastq.gz
│ ├── 002-436_Saliva_R1_001.fastq.gz
│ └── 002-436_Saliva_R2_001.fastq.gz
├── TACTICPipeline_args_set_1.yml
└── TACTICPipeline_args_set_2.ymlArguemnt set 1:
When using docker image:
docker run --rm -v "/path/to/your/project/directory:/base/inputs" ghcr.io/mpourjam/tactic-pipeline:0.7.3 --fastq-directory Sequencing_Run_1/ --yml-file TACTICPipeline_args_set_1.ymlor when using the script:
python run_tactic.py --input-directory /path/to/your/project/directory --fastq-directory Sequencing_Run_1/ --yml-file TACTICPipeline_args_set_1.ymlArgument set 2:
docker run --rm -v "/path/to/your/project/directory:/base/inputs" ghcr.io/mpourjam/tactic-pipeline:0.7.3 --fastq-directory Sequencing_Run_1/ --yml-file TACTICPipeline_args_set_1.ymlor when using the script:
python run_tactic.py --input-directory /path/to/your/project/directory --fastq-directory Sequencing_Run_1/ --yml-file TACTICPipeline_args_set_1.ymlNote that every path argument to the CLI is given relative to `--input-directory`` !!!
The auto discovery of pipeline can also be overridden by a giving mapping file in tab-separated format file. In the case like below, you can choose a custom set of fastq files from different directory to analyze by giving a uniq part of their name in a mapping file.
My project directory looks like below:
├── Sequencing_Run_1
│ └── fastqs
│ ├── 001-654_Stool_R1_001.fastq.gz
│ ├── 001-654_Stool_R2_001.fastq.gz
│ ├── 002-342_Stool_R1_001.fastq.gz
│ └── 002-342_Stool_R2_001.fastq.gz
├── Sequencing_Run_2
│ └── fastqs
│ ├── 001-435_Saliva_R1_001.fastq.gz
│ ├── 001-435_Saliva_R2_001.fastq.gz
│ ├── 002-436_Saliva_R1_001.fastq.gz
│ └── 002-436_Saliva_R2_001.fastq.gz
├── TACTICPipeline_args_set_1.yml
├── TACTICPipeline_args_set_2.yml
├── TACTICPipeline_args_set_combined.yml
└── my_maping_file.csvI want to pick one file from each sequencing run and run the analysis. In order to do so, I follow the steps below:
- I generate a mapping file template by running TACTIC by
-tfor--place-template-files. - I modify the
mapping_file_TEMPLATE.csvplaced in the working directory and make sure that#SampleIDcolumn of the file include uniq base name of desired fastq files cut from R1 or R2 suffix. - I run the pipeline with
-mapor--mapping-fileargument and pass my modified mapping file to it.
In this example, I have modified the mapping file like below as I only want to run one sample from each sequencing run.
#SampleID total_weight_in_g spike_amount parent_path
001-435_Saliva 1 nan
001-654_Stool 1.2 nanColumns:
- #SampleID: contains the uniq basename of samples
- total_weight_in_g: The weight of sample taken for sequencing. The value should be given if your samples are spiked. Otherwise any positive value could be given.
- spike_amoung: amount of spike (in nano gram) added to your samples if you have asked for spiked sequencing. If your samples are not spiked the value of this column for your samples should be "nan".
- parent_path: This column could contain the path (relative to
--fastq-directory) to parent directory of given samples. This clarifies the situations in which two exact samples with the same file names exist in two different directories. In this case, by giving proper parent directory path to this column, the pipeline will pick the correct (pairs of) files.
After, I have prepared the mapping_file.csv, I run the pipeline like below:
docker run --rm -v "/path/to/your/project/directory:/base/inputs" ghcr.io/mpourjam/tactic-pipeline:0.7.3 --yml-file TACTICPipeline_args_set_1.yml --mapping-file my_mapping_file.csvor when using the script:
python run_tactic.py --input-directory /path/to/your/project/directory --fastq-directory Sequencing_Run_1/ --yml-file TACTICPipeline_args_set_1.yml --mapping-file my_mapping_file.csvNote that the search scope of the pipeline is not narrowed down to specific directory by
--fastq-directoryas our samples are in two different directories within our--input-directory.
Every time the pipeline runs, it checks the existence of required database files. By default it checks /databases direcotry but this could be set manually to avoid re-download and re-indexing of of databases which takes time. In order to set the databases directory:
docker run --rm -v "/path/to/your/project/directory:/base/inputs" -v "/path/to/desired/database/directory:/databases" ghcr.io/mpourjam/tactic-pipeline:0.7.3 --yml-file TACTICPipeline_args_set_1.yml --mapping-file my_mapping_file.csv We defined a mount point in the docker container by the second
-vso that the container sees our fixed database directory in its/databases
or when using the script:
python run_tactic.py --input-directory /path/to/your/project/directory --fastq-directory Sequencing_Run_1/ --yml-file TACTICPipeline_args_set_1.yml --mapping-file my_mapping_file.csv --db-directory /path/to/desired/database/directory
--db-directorycould be relative or absolute. If given as absolute (with preceding "/") then the absolute path is taken, if given as relative path then it will be a path inside--input-directory.