SYSTEM / DOCKER / BIOINFORMATICS / GENOMICS
High-throughput sequencing generates massive amounts of data, but raw reads can contain errors, adapter remnants or biases that compromise downstream analyses. Performing quality control (QC) on FASTQ files before and after trimming is essential to catch these issues early. FastQC-RS is a modern, Rust-based QC tool that delivers fast, reliable assessments and easy-to-read HTML reports—perfect for genomics and transcriptomics workflows.
FastQC-RS is a command-line utility, inspired by the original FastQC, that scans FASTQ files and generates detailed QC summaries. Written in Rust, it offers:
FastQC-RS evaluates multiple aspects of your sequencing data and produces intuitive graphs and tables.
Provides an overview of each file:
| Metric | Value |
|---|---|
| Total reads | 8,860,157 |
| Average read length | 100 |
| Average GC content | 44% |
| File name | SRR3317165_1.fastq.gz |
Shows quality (Phred) scores at each position in the read:
Detects unusual GC patterns that may indicate contamination or bias:
Like most other genomics and transcriptomics software, FastQC is straightforward to install using any flavor of conda. My particular favorite for licensing purposes and improved speed is mamba, but conda and anaconda will also work.
Install FastQC-RS using conda or mamba:
# Using Conda or Anaconda
conda install -c bioconda -c conda-forge fastqc
# Using mamba
mamba install -c bioconda -c conda-forge fastqc
Verify the installation:
fastqc --version
This should return the version number, e.g., fastqc 0.3.4.
Notes: If you are using a conda environment, make sure to activate it first. Also, if you are using a different version of FastQC-RS, adjust the version number accordingly.
A containerized setup ensures reproducibility and portability:
Create a dockerfile.fastqcrs in your working directory.
FROM mambaorg/micromamba:2.0-debian11
RUN micromamba install \
-c bioconda \
-c conda-forge \
fastqc-rs==0.3.4 \
&& micromamba clean -a -y
Using dockerfile.fastqcrs above, run the following commdan to build.
docker build \
-f ./dockerfile.fastqcrs \
-t fastqc-rs:0.3.4 .
First, download example FASTQ files from the European Nucleotide Archive (ENA).
# Make directory, to download your data into
mkdir data
# Download FASTQ files for Bacillus subtilis ALBA01
wget -nc -P ./data ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR331/005/SRR3317165/SRR3317165_1.fastq.gz
wget -nc -P ./data ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR331/005/SRR3317165/SRR3317165_2.fastq.gz
Now that we have the data, lets go through two examples. Conda and Docker installed FastQC.
For the conda environment, its pretty straightforwrad. Just run the following command.
fqc -q ./data/SRR3317165_1.fastq.gz > ./data/SRR3317165_1.html
-q ./data/SRR3317165_1.fastq.gz:
The FASTQ file to be assessed.> ./data/SRR3317165_1.html: Specifies the directory for output report.For the Docker environment, the command gets a bit more involved, but dont sweat it. It works all the same, and since its in a Docker environment its much easier to plug into a cloud based pipeline.
docker run --rm -it \
-v "$(pwd):/app" \
fastqc-rs:0.3.4 \
--user 1000:1000 \
bash -c \
"fqc -q /app/data/SRR3317165_1.fastq.gz > /app/data/SRR3317165_1.html"
FastQC-RS is ideally used in conjunction with preprocessing tools like fastp or Trimmomatic. Typically, you would:
FastQC remains indispensable in modern bioinformatics, providing clear, actionable insights into sequencing data quality. Integrating FastQC into your omics workflows helps ensure robust and reliable data analysis outcomes.
Happy sequencing and quality checking!
References: