Flye: Assembling Long Reads with Ease
TECHNICAL OVERVIEW

FLYE: ASSEMBLING LONG READS WITH EASE

SYSTEM / DOCKER / BIOINFORMATICS / GENOMICS

TODO: COMPLETE Flye: Assembling Long Reads with Ease

Setting Up Flye with Conda and Docker

Flye is a de novo assembler optimized for long‑read sequencing data (PacBio CLR/HiFi, ONT). It builds polished contigs via repeat‑graph assembly and supports both isolate and metagenome modes.

In this guide we'll cover:

  1. Prerequisites
  2. Installing Flye via Conda
  3. Running Flye in Docker
  4. Example assembly workflows
  5. Tips & Best Practices
  6. Further Resources

1. Prerequisites

  • Conda (Miniconda or Anaconda) for package management
  • Docker (optional) for containerized runs
  • A folder of long‑read FASTQ/FASTA files to assemble
  • Basic familiarity with the command line

2. Installing Flye via Conda

Flye releases are maintained on Bioconda. To install version 2.9.5:

# Ensure channels are set up correctly
conda config --add channels defaults
conda config --add channels bioconda
conda config --add channels conda-forge

# Create (or activate) an environment
conda create -n flye_env python=3.9 -y
conda activate flye_env

# Install Flye
conda install flye=2.9.5 -y

To upgrade:

conda update flye -y

3. Running Flye in Docker

If you prefer not to install locally, build a simple Docker image:

# Dockerfile
FROM continuumio/miniconda3:latest

# Install Flye
RUN conda config --add channels defaults \
    && conda config --add channels bioconda \
    && conda config --add channels conda-forge \
    && conda install flye=2.9.5 -y \
    && conda clean --all -y

ENTRYPOINT ["flye"]

Build and tag:

docker build -t flye:2.9.5 .

Run an assembly (mounting your data directory):

docker run --rm -v $(pwd):/data flye:2.9.5 \
  --nano-raw /data/reads.fastq.gz \
  --out-dir /data/assembly \
  --threads 8

4. Example Assembly Workflows

4.1 Bacterial ONT Assembly

# Using Conda-installed Flye
flye \
  --nano-raw ecoli_ont.fastq.gz \
  --out-dir ecoli_assembly \
  --threads 4

Outputs in ecoli_assembly/:

  • assembly.fasta: polished contigs
  • graph_repeats.gfa: repeat graph in GFA format
  • logs and assembly stats files

4.2 PacBio HiFi Assembly

flye \
  --pacbio-hifi sample_hifi.fasta.gz \
  --genome-size 3g \
  --out-dir human_hifi \
  --threads 16

This will tune error thresholds for HiFi reads and use 16 CPU threads.


5. Tips & Best Practices

  • Genome size (-g): always specify approximate size for faster, more accurate overlap detection.
  • Meta‑assembly: add --meta for metagenomic/uneven coverage datasets.
  • Haplotype preservation: use --keep-haplotypes to retain alternative contigs.
  • Resume: if interrupted, run with --resume in the same output directory.
  • Scaffolding: enable --scaffold if you want automatic scaffolding of contigs.

6. Further Resources

  • Flye GitHub: https://github.com/fenderglass/Flye
  • Publication: Kolmogorov et al. (2019) Nat. Biotechnol. 37:540–546
  • Bandage for GFA graphs: https://github.com/rrwick/Bandage

Happy assembling!