SYSTEM / DOCKER / BIOINFORMATICS / GENOMICS
Flye is a de novo assembler optimized for long‑read sequencing data (PacBio CLR/HiFi, ONT). It builds polished contigs via repeat‑graph assembly and supports both isolate and metagenome modes.
In this guide we'll cover:
Flye releases are maintained on Bioconda. To install version 2.9.5:
# Ensure channels are set up correctly
conda config --add channels defaults
conda config --add channels bioconda
conda config --add channels conda-forge
# Create (or activate) an environment
conda create -n flye_env python=3.9 -y
conda activate flye_env
# Install Flye
conda install flye=2.9.5 -y
To upgrade:
conda update flye -y
If you prefer not to install locally, build a simple Docker image:
# Dockerfile
FROM continuumio/miniconda3:latest
# Install Flye
RUN conda config --add channels defaults \
&& conda config --add channels bioconda \
&& conda config --add channels conda-forge \
&& conda install flye=2.9.5 -y \
&& conda clean --all -y
ENTRYPOINT ["flye"]
Build and tag:
docker build -t flye:2.9.5 .
Run an assembly (mounting your data directory):
docker run --rm -v $(pwd):/data flye:2.9.5 \
--nano-raw /data/reads.fastq.gz \
--out-dir /data/assembly \
--threads 8
# Using Conda-installed Flye
flye \
--nano-raw ecoli_ont.fastq.gz \
--out-dir ecoli_assembly \
--threads 4
Outputs in ecoli_assembly/:
assembly.fasta: polished contigsgraph_repeats.gfa: repeat graph in GFA formatflye \
--pacbio-hifi sample_hifi.fasta.gz \
--genome-size 3g \
--out-dir human_hifi \
--threads 16
This will tune error thresholds for HiFi reads and use 16 CPU threads.
-g): always specify approximate size for faster, more accurate overlap detection.--meta for metagenomic/uneven coverage datasets.--keep-haplotypes to retain alternative contigs.--resume in the same output directory.--scaffold if you want automatic scaffolding of contigs.Happy assembling!