CLI reference

Dragon ships a single binary with eleven subcommands. Run dragon <subcommand> --help for the full option list at any time; this page describes each subcommand and its most-used flags.

Command

Purpose

dragon index

Build a Dragon index from a directory of FASTA genomes

dragon search

Align query sequences (single or multi-shard)

dragon info

Print index metadata

dragon download

Download genomes (RefSeq, AllTheBacteria) or pre-built indices

dragon update

Add new genomes as a lightweight overlay

dragon compact

Merge base + overlays back into one optimised index

dragon summarize

Per-species prevalence/identity report from PAF output

dragon export-zarr

Export an index as a Zarr v3 store

dragon search-zarr

Pattern-search a Zarr-backed index (local or s3://)

dragon signal-index

Build a signal-level index from FASTA via a pore model

dragon signal-search

Align raw nanopore current signals (TSV/CSV/SLOW5)


dragon index

Build a Dragon index from a directory of genome FASTA files. Uses GGCAT for the colored compacted de Bruijn graph if available, falling back to an internal builder for small datasets.

dragon index [OPTIONS] --input <DIR> --output <DIR>

Option

Short

Default

Description

--input

-i

required

Directory of genome FASTA files (.fa, .fasta, .fna)

--output

-o

required

Output directory for the index

--kmer-size

-k

31

K-mer size for the de Bruijn graph

--threads

-j

4

Number of threads

--low-memory

off

External-memory SA construction (≤8 GB RAM by default)

--max-ram

8.0

RAM budget in GB for --low-memory mode

--auto

off

Auto-batch large collections into overlays (transparent at query time)

The index directory contains:

  • fm_index.bin — concatenated unitig text + suffix array

  • colors.drgn — Roaring-bitmap colour index per unitig

  • paths.bin — per-genome unitig path (mmap-friendly v2 format on new builds)

  • specificity.drgn — per-genome private-unitig sets

  • metadata.json — version, k-mer size, genome count, total bases

  • unitigs.fa (optional) — keep for resume/auto-batch; safe to delete after successful build

Index examples

# Default: 31-mer, 4 threads, RAM-bounded only by the system
dragon index -i genomes/ -o my_index/

# Low-memory: external-memory SA construction with 8 GB cap
dragon index -i genomes/ -o my_index/ --low-memory --max-ram 8

# Use all cores
dragon index -i genomes/ -o my_index/ -j $(nproc)

# Auto-batch a million-genome collection into overlays
dragon index -i giant_dir/ -o giant_idx/ --auto --max-ram 64

Resume

Index construction is resumable: if fm_index.bin and colors.drgn already exist in the output directory, Dragon skips Steps 1–4 (GGCAT + FM-index + colours) and resumes from Step 5 (path index). Useful when a job is killed during the long path-building step.



dragon info

Display index metadata.

dragon info --index <DIR>

Example output

Dragon Index Information
========================
Version:         0.1.0
K-mer size:      31
Genomes:         32000
Unitigs:         9137000
Total bases:     434072471
Index size:      783.86 GB

dragon download

Download genomes or a pre-built Dragon index.

dragon download [OPTIONS] --database <NAME> --output <DIR>

Supported databases

Name

Behaviour

gtdb-r220

Download a pre-built GTDB r220 representative-genomes index

allthebacteria-v2

Download a pre-built AllTheBacteria v2 index

refseq-bacteria

Download a pre-built RefSeq bacteria index

allthebacteria

Download genomes from EBI AllTheBacteria, then build

refseq

Download all RefSeq bacteria genomes, then build

refseq-representative

Download only RefSeq representatives, then build

http(s)://...

Custom URL to a pre-built index tarball

Example

dragon download -d gtdb-r220 -o gtdb_r220_index/

For the genome-download-and-build modes, the index construction step honours --low-memory, --kmer-size, --threads, and --max-ram.


dragon update

Add new genomes as a lightweight overlay without re-running GGCAT / FM-index construction. Queries automatically search both the base index and all overlays.

dragon update --index <DIR> --genomes <DIR> [--kmer-size 31]

When overlays exceed ~10 % of the base index, dragon update warns and recommends running dragon compact.


dragon compact

Merge the base index and all overlays back into one optimised index. Run after dragon update has accumulated significant overlay growth.

dragon compact --index <DIR> --genomes <DIR> [--kmer-size 31]

--genomes should point to all FASTA files (base + overlays), since compact rebuilds from scratch with the merged genome set.


dragon summarize

Generate a per-species surveillance summary from PAF output produced by dragon search.

dragon summarize --input <PAF> [--output <FILE>] [--format tsv|json]
                 [--index <DIR>] [--total-genomes <N>]

The summary contains, per species:

  • Prevalence (fraction of database genomes carrying the query)

  • Mean / min / max alignment identity

  • Number of unique sequence variants

Designed for AMR-gene surveillance and other epidemiological queries.


dragon export-zarr

Export a Dragon index as a Zarr v3 store (chunked, Zstd-compressed). The original index is not modified.

dragon export-zarr --index <DIR> --output <DIR>

Store layout under <output>/:

zarr.json                   root attrs (kmer_size, num_genomes, ...)
text/                       u8 unitig text, 1 MiB chunks, Zstd-3
suffix_array/               u64 SA, 131 072-entry chunks, Zstd-3
unitig_lengths/             u64 per-unitig lengths
colors/offsets/             u64 byte offsets per unitig
colors/bitmaps/             raw RoaringBitmap bytes, 1 MiB chunks, Zstd-3

Anyone with zarr-python and s3fs can open the store from a public bucket without AWS credentials — the on-disk format is the on-cloud format.


dragon search-zarr

Pattern-search a Zarr-backed index. Reads only the chunks each query touches, making it cheap over remote object stores (S3, GCS) that expose HTTP range requests.

dragon search-zarr --zarr <PATH | s3://...> --query <FILE> [--output <FILE>]

For full alignment use the binary backend (dragon search). search-zarr is intended as the cloud-native pattern-match demo: it returns matching text positions, the underlying unitig IDs, and the genomes carrying each unitig.


dragon signal-index

Build a signal-level index from genome FASTA files by converting expected nanopore current via a pore model and discretising into a finite alphabet.

dragon signal-index --input <DIR> --output <DIR> [OPTIONS]

Option

Default

Description

--num-levels

16

Discretisation alphabet size

--threads

4

Threads

--signal-boundaries

Path to learned discretisation boundaries (JSON)

--pore-model

built-in R10.4.1

Path to a custom pore model JSON



Environment variables

Variable

Description

RUST_LOG

Logging level: error, warn, info, debug, trace

RUST_LOG=debug dragon search -i my_index/ -q query.fa
RUST_LOG=warn  dragon search -i my_index/ -q query.fa