CLI reference

Dragon ships a single binary with eleven subcommands. Run dragon <subcommand> --help for the full option list at any time; this page describes each subcommand and its most-used flags.

Command	Purpose
`dragon index`	Build a Dragon index from a directory of FASTA genomes
`dragon search`	Align query sequences (single or multi-shard)
`dragon info`	Print index metadata
`dragon download`	Download genomes (RefSeq, AllTheBacteria) or pre-built indices
`dragon update`	Add new genomes as a lightweight overlay
`dragon compact`	Merge base + overlays back into one optimised index
`dragon summarize`	Per-species prevalence/identity report from PAF output
`dragon export-zarr`	Export an index as a Zarr v3 store
`dragon search-zarr`	Pattern-search a Zarr-backed index (local or `s3://`)
`dragon signal-index`	Build a signal-level index from FASTA via a pore model
`dragon signal-search`	Align raw nanopore current signals (TSV/CSV/SLOW5)

`dragon index`

Build a Dragon index from a directory of genome FASTA files. Uses GGCAT for the colored compacted de Bruijn graph if available, falling back to an internal builder for small datasets.

dragon index [OPTIONS] --input <DIR> --output <DIR>

Option	Short	Default	Description
`--input`	`-i`	required	Directory of genome FASTA files (`.fa`, `.fasta`, `.fna`)
`--output`	`-o`	required	Output directory for the index
`--kmer-size`	`-k`	`31`	K-mer size for the de Bruijn graph
`--threads`	`-j`	`4`	Number of threads
`--low-memory`		off	External-memory SA construction (≤8 GB RAM by default)
`--max-ram`		`8.0`	RAM budget in GB for `--low-memory` mode
`--auto`		off	Auto-batch large collections into overlays (transparent at query time)

The index directory contains:

fm_index.bin — concatenated unitig text + suffix array
colors.drgn — Roaring-bitmap colour index per unitig
paths.bin — per-genome unitig path (mmap-friendly v2 format on new builds)
specificity.drgn — per-genome private-unitig sets
metadata.json — version, k-mer size, genome count, total bases
unitigs.fa (optional) — keep for resume/auto-batch; safe to delete after successful build

Index examples

# Default: 31-mer, 4 threads, RAM-bounded only by the system
dragon index -i genomes/ -o my_index/

# Low-memory: external-memory SA construction with 8 GB cap
dragon index -i genomes/ -o my_index/ --low-memory --max-ram 8

# Use all cores
dragon index -i genomes/ -o my_index/ -j $(nproc)

# Auto-batch a million-genome collection into overlays
dragon index -i giant_dir/ -o giant_idx/ --auto --max-ram 64

Resume

Index construction is resumable: if fm_index.bin and colors.drgn already exist in the output directory, Dragon skips Steps 1–4 (GGCAT + FM-index + colours) and resumes from Step 5 (path index). Useful when a job is killed during the long path-building step.

`dragon search`

Search query sequences against a Dragon index. Supports multi-shard search via repeatable --shard arguments — each shard is searched independently and results are merged with per-genome deduplication.

dragon search [OPTIONS] --index <DIR> --query <FILE>

Core options

Option	Short	Default	Description
`--index`	`-i`	required	Path to Dragon index directory
`--shard`		—	Additional shard directory (repeatable)
`--query`	`-q`	required	Query FASTA/FASTQ file
`--output`	`-o`	`-` (stdout)	Output file
`--format`	`-f`	`paf`	`paf`, `blast6`, `summary`, or `gfa`
`--threads`	`-j`	`4`	Number of threads
`--max-ram`		`4.0`	RAM budget in GB
`--profile`		`workstation`	`laptop` (≤8 GB, 4 threads) or `workstation`

Filtering & scoring

Option	Default	Description
`--min-seed-len`	`15`	Minimum seed match length
`--max-seed-freq`	`10000`	Skip seeds occurring more than this many times
`--min-chain-score`	`50`	Minimum chain score to report
`--max-target-seqs`	`10`	Maximum hits per query
`--min-identity`	`0.7`	Minimum alignment identity (0.0–1.0)
`--min-query-coverage`	`0.3`	Minimum query coverage (0.0–1.0)
`--min-score-ratio`	`0.1`	Drop hits scoring below `ratio × best_score`

ML scoring & training

Option	Default	Description
`--no-ml`	off	Disable learned seed scoring (use raw match length)
`--ml-weights`	built-in	Path to a custom JSON of 7 scorer weights
`--dump-seeds`	—	Dump every seed + features to TSV for ML training
`--ground-truth`	—	Ground-truth genome name (with `--dump-seeds`) for labelled training data
`--gfa-radius`	`5`	Number of unitig steps around each hit (used with `--format gfa`)

Search examples

# Basic search
dragon search -i my_index/ -q query.fa -o results.paf

# Multi-shard against several species-level batches
dragon search -i saureus_b1/ \
    --shard saureus_b2/ --shard saureus_b3/ --shard kpneumo_b1/ \
    -q amr_genes.fa -o hits.paf

# Surveillance summary instead of PAF
dragon search -i my_index/ -q amr_genes.fa --format summary

# Laptop profile (clamps RAM and threads)
dragon search -i my_index/ -q query.fa --profile laptop

# Pipe through standard PAF tooling
dragon search -i my_index/ -q query.fa | awk '$12 >= 30' > filtered.paf

`dragon info`

Display index metadata.

dragon info --index <DIR>

Example output

Dragon Index Information
========================
Version:         0.1.0
K-mer size:      31
Genomes:         32000
Unitigs:         9137000
Total bases:     434072471
Index size:      783.86 GB

`dragon download`

Download genomes or a pre-built Dragon index.

dragon download [OPTIONS] --database <NAME> --output <DIR>

Supported databases

Name	Behaviour
`gtdb-r220`	Download a pre-built GTDB r220 representative-genomes index
`allthebacteria-v2`	Download a pre-built AllTheBacteria v2 index
`refseq-bacteria`	Download a pre-built RefSeq bacteria index
`allthebacteria`	Download genomes from EBI AllTheBacteria, then build
`refseq`	Download all RefSeq bacteria genomes, then build
`refseq-representative`	Download only RefSeq representatives, then build
`http(s)://...`	Custom URL to a pre-built index tarball

Example

dragon download -d gtdb-r220 -o gtdb_r220_index/

For the genome-download-and-build modes, the index construction step honours --low-memory, --kmer-size, --threads, and --max-ram.

`dragon update`

Add new genomes as a lightweight overlay without re-running GGCAT / FM-index construction. Queries automatically search both the base index and all overlays.

dragon update --index <DIR> --genomes <DIR> [--kmer-size 31]

When overlays exceed ~10 % of the base index, dragon update warns and recommends running dragon compact.

`dragon compact`

Merge the base index and all overlays back into one optimised index. Run after dragon update has accumulated significant overlay growth.

dragon compact --index <DIR> --genomes <DIR> [--kmer-size 31]

--genomes should point to all FASTA files (base + overlays), since compact rebuilds from scratch with the merged genome set.

`dragon summarize`

Generate a per-species surveillance summary from PAF output produced by dragon search.

dragon summarize --input <PAF> [--output <FILE>] [--format tsv|json]
                 [--index <DIR>] [--total-genomes <N>]

The summary contains, per species:

Prevalence (fraction of database genomes carrying the query)
Mean / min / max alignment identity
Number of unique sequence variants

Designed for AMR-gene surveillance and other epidemiological queries.

`dragon export-zarr`

Export a Dragon index as a Zarr v3 store (chunked, Zstd-compressed). The original index is not modified.

dragon export-zarr --index <DIR> --output <DIR>

Store layout under <output>/:

zarr.json                   root attrs (kmer_size, num_genomes, ...)
text/                       u8 unitig text, 1 MiB chunks, Zstd-3
suffix_array/               u64 SA, 131 072-entry chunks, Zstd-3
unitig_lengths/             u64 per-unitig lengths
colors/offsets/             u64 byte offsets per unitig
colors/bitmaps/             raw RoaringBitmap bytes, 1 MiB chunks, Zstd-3

Anyone with zarr-python and s3fs can open the store from a public bucket without AWS credentials — the on-disk format is the on-cloud format.

`dragon search-zarr`

Pattern-search a Zarr-backed index. Reads only the chunks each query touches, making it cheap over remote object stores (S3, GCS) that expose HTTP range requests.

dragon search-zarr --zarr <PATH | s3://...> --query <FILE> [--output <FILE>]

For full alignment use the binary backend (dragon search). search-zarr is intended as the cloud-native pattern-match demo: it returns matching text positions, the underlying unitig IDs, and the genomes carrying each unitig.

`dragon signal-index`

Build a signal-level index from genome FASTA files by converting expected nanopore current via a pore model and discretising into a finite alphabet.

dragon signal-index --input <DIR> --output <DIR> [OPTIONS]

Option	Default	Description
`--num-levels`	`16`	Discretisation alphabet size
`--threads`	`4`	Threads
`--signal-boundaries`	—	Path to learned discretisation boundaries (JSON)
`--pore-model`	built-in R10.4.1	Path to a custom pore model JSON

`dragon signal-search`

Align raw nanopore current signals against a signal-level index. Inputs are TSV, CSV, or SLOW5 text format with auto-detection.

dragon signal-search --index <DIR> --query <FILE> [OPTIONS]

Option	Default	Description
`--signal-kmer-size`	`10`	Signal k-mer size for backward search
`--min-hits`	`3`	Minimum k-mer hits to report a genome
`--max-seed-freq`	`10000`	Skip signal k-mers above this frequency
`--max-results`	`50`	Maximum results per query read
`--threads`	`4`	Threads

Environment variables

Variable	Description
`RUST_LOG`	Logging level: `error`, `warn`, `info`, `debug`, `trace`

RUST_LOG=debug dragon search -i my_index/ -q query.fa
RUST_LOG=warn  dragon search -i my_index/ -q query.fa

CLI reference

dragon index

Index examples

Resume

dragon search

Core options

Filtering & scoring

ML scoring & training

Search examples

dragon info

Example output

dragon download

Supported databases

Example

dragon update

dragon compact

dragon summarize

dragon export-zarr

dragon search-zarr

dragon signal-index

dragon signal-search