SYSTEM / DOCKER / BIOINFORMATICS / GENOMICS
Funannotate is an automated genome annotation pipeline primarily designed for fungi, but it can also handle higher eukaryotes. It orchestrates repeat masking, gene prediction, and functional annotation into a streamlined workflow.
This guide covers:
The recommended install uses Bioconda channels. If dependency resolution is slow, consider replacing conda with mamba.
# Configure channels
conda config --add channels defaults
conda config --add channels bioconda
conda config --add channels conda-forge
# Create and activate environment
conda create -n funannotate_env "python>=3.6,<3.9" funannotate -y
conda activate funannotate_env
Tip: If solving is slow:
conda install -n base mamba -y mamba create -n funannotate_env funannotate -y
To update:
conda update funannotate -y
A Docker image with Funannotate and required databases is available.
# Pull the full image (with databases)
docker pull nextgenusfs/funannotate
# (Optional) Download the provided bash wrapper
wget -O funannotate-docker \
https://raw.githubusercontent.com/nextgenusfs/funannotate/master/funannotate-docker
chmod +x funannotate-docker
# Run a command, e.g. predict step with 12 CPUs
./funannotate-docker predict -t predict --cpus 12
If you prefer a slim image (no databases):
docker pull nextgenusfs/funannotate-slim
You can use docker run directly or via the funannotate-docker wrapper to automatically bind volumes and retain your user permissions.
GeneMark is not distributable via Bioconda. To enable GeneMark support:
/usr/bin/env perl.gmes_petap.pl directory to your $PATH, or set:
export GENEMARK_PATH=/path/to/gmes_petap
Without GeneMark, Funannotate will rely on BUSCO/Augustus for ab initio predictions.
Assume files in working directory:
assembly.fasta
left_R1.fq.gz right_R1.fq.gz
left_R2.fq.gz right_R2.fq.gz
nanopore_rna.fq.gz
# 1) Clean and sort assembly
funannotate clean -i assembly.fasta --minlen 1000 -o assembly.clean.fa
funannotate sort -i assembly.clean.fa -b scaffold -o assembly.sorted.fa
# 2) Mask repeats
funannotate mask -i assembly.sorted.fa --cpus 12 -o assembly.masked.fa
# 3) Train with RNA-seq
funannotate train \
-i assembly.masked.fa \
--left left_R1.fq.gz,right_R1.fq.gz \
--right left_R2.fq.gz,right_R2.fq.gz \
--nanopore_mrna nanopore_rna.fq.gz \
--stranded RF --jaccard_clip \
--species "MySpecies" --strain "StrainA" \
--cpus 12 -o fun_run
# 4) Predict gene models
funannotate predict -i assembly.masked.fa \
--species "MySpecies" --strain "StrainA" \
--cpus 12 -o fun_run
# 5) Update UTRs and refine gene models
funannotate update -i fun_run --cpus 12
# 6) Functional annotation
funannotate iprscan -i fun_run -m docker --cpus 12
funannotate annotate -i fun_run --cpus 12
Results appear under fun_run/predict_results, fun_run/update_results, and fun_run/annotate_results.
Without RNA data, skip train/update and use BUSCO seed species:
# Mask repeats
funannotate mask -i assembly.fasta --cpus 12 -o assembly.masked.fa
# Predict with BUSCO-based training
funannotate predict \
-i assembly.masked.fa \
--species "MySpecies" --strain "StrainA" \
--busco_seed_species botrytis_cinerea \
--cpus 12 -o fun_run_genome_only
# Functional annotation
funannotate annotate -i fun_run_genome_only --cpus 12
--dont_overwrite: protect existing outputs.--resume if interrupted.--meta in predict step for uneven coverage.--max_intronlen for non-fungal genomes.--repeats2evm in predict to reduce false positives in large genomes.Happy annotating!