TaxTriage (Metagenomics) (Under Construction)

Warning

This module is under construction and is in alpha-release. Scheduled full release of v1.0 in Oct. 2022

Standard diagram for deployment and pipeline development

The pipeline consists of a variety of alignment/classification steps as well as QC and pre-filtering processes. It is designed to be serve as the initial triage step for identifying unknown organisms present in one or more sample types and supports both Illumina or Oxford Nanopore-generated NGS data.

The pipeline is packaged to include basic quality control to making a (potential) de-novo assembly for each organism that is detected in the sample from a filtering a hierarchical perspective. That is, the most prevalent taxonomic IDs at various ranks in the hierarchical chain are reported, binned, and run through a variety of alignment and assembly steps (for lower levels like species). Finally, a set of flags are generated for each taxonomic map that is the most prevalent per sample.

A list of tools used are listed below for each step

Demultiplex and Gather OPTIONAL, Oxford Nanopore Only

Artic Guppyplex - Aggregate Nanopore reads for downstream analysis

Quality Control OPTIONAL

PycoQC - Computes metrics and generates interactive QC plots for Oxford Nanopore technologies sequencing data

Trimming

Illumina: Trimgalore

Oxford: Porechop

Filtering

Kraken2

QC Plotting

Illumina: FastQC

Oxford: Nanoplot

Classification (K-mer approach)

Kraken2

Alignent Stats

Illumina: BWAMEM2

Oxford: Minimap2

Report Generation

MultiQC

Please see relevant links in the listed modules for more information on the underlying mechanisms and corresponding papers (if existent)

Parameters

Samplesheet (.csv): file

Contains a mapping of metadata and a single sample per row. Explanations of the possible columns for Basestack are seen below:

Samplesheet Description
Column Name	Description
sample	Custom sample name. This entry will be identical for multiple sequencing libraries/runs from the same sample. Spaces in sample names are automatically converted to underscores (_).
single_end	Is the data single or paired end
fastq_1	Full path to FastQ file for Illumina short reads 1 OR OXFORD reads. File has to be gzipped and have the extension “.fastq.gz” or “.fq.
fastq_2	Full path to FastQ file for Illumina short reads 2. File has to be gzipped and have the extension “.fastq.gz” or “.fq.
barcode	TRUE/FALSE, is the row attributed to a demultiplexed barcode folder of 1 or more fastq files or is it a single file that is .
from	Directory path of the barcode, only used with the column being set as TRUE in the barcode column
trim	TRUE/FALSE, do you want to run trimming on the sample?
platform	Platform used, [ILLUMINA, OXFORD]
sequencing_summary	If detected, output plots based on the the sequencing summary file for that sample

Example Samplesheet
sample	fastq_1	fastq_2	platform	from	trim	sequencing_summary	single_end	barcode
Sample_1	AEG588A1_S1_L001_R1_001.fastq.gz	AEG588A1_S1_L001_R2_001.fastq.gz	ILLUMINA	NULL (or leave blank)	FALSE	NULL (or leave blank)	FALSE	FALSE
Sample_2	ecoli_reads.fastq	NULL	OXFORD	NULL	FALSE	sequencing_summary.txt	TRUE	FALSE
Sample_3	NULL	NULL	OXFORD	barcode01	TRUE	FALSE	TRUE	TRUE

For the samples shown above:

A paired-end run of Illumina data where we DON’T trim anything (no Trimgalore)
A single-end Oxford Nanopore run where all reads are concatenated to a single fastq file. No barcode. There is a sequencing summary file we want to plot for run statistics/plots
A single-end Oxford Nanopore run where reads have NOT been demultiplexed and/or aggregated to a single fastq file (like row 2). This will run artic guppyplex as well to concatenate all to one fastq file

Returns

MultiQC report HTML file
Variety of intermediate and output results files for the MultiQC report

Examples: - SAM/BAM alignment - Filtered FASTQ Files (for downstream use) - Assembly (de novo) - WIP and is not ready just yet - Kraken2 Report(s)