# CLI reference Dragon ships a single binary with eleven subcommands. Run `dragon --help` for the full option list at any time; this page describes each subcommand and its most-used flags. | Command | Purpose | | --- | --- | | `dragon index` | Build a Dragon index from a directory of FASTA genomes | | `dragon search` | Align query sequences (single or multi-shard) | | `dragon info` | Print index metadata | | `dragon download` | Download genomes (RefSeq, AllTheBacteria) or pre-built indices | | `dragon update` | Add new genomes as a lightweight overlay | | `dragon compact` | Merge base + overlays back into one optimised index | | `dragon summarize` | Per-species prevalence/identity report from PAF output | | `dragon export-zarr` | Export an index as a Zarr v3 store | | `dragon search-zarr` | Pattern-search a Zarr-backed index (local or `s3://`) | | `dragon signal-index` | Build a signal-level index from FASTA via a pore model | | `dragon signal-search` | Align raw nanopore current signals (TSV/CSV/SLOW5) | --- ## `dragon index` Build a Dragon index from a directory of genome FASTA files. Uses GGCAT for the colored compacted de Bruijn graph if available, falling back to an internal builder for small datasets. ```text dragon index [OPTIONS] --input --output ``` | Option | Short | Default | Description | | --- | --- | --- | --- | | `--input` | `-i` | *required* | Directory of genome FASTA files (`.fa`, `.fasta`, `.fna`) | | `--output` | `-o` | *required* | Output directory for the index | | `--kmer-size` | `-k` | `31` | K-mer size for the de Bruijn graph | | `--threads` | `-j` | `4` | Number of threads | | `--low-memory` | | off | External-memory SA construction (≤8 GB RAM by default) | | `--max-ram` | | `8.0` | RAM budget in GB for `--low-memory` mode | | `--auto` | | off | Auto-batch large collections into overlays (transparent at query time) | The index directory contains: - `fm_index.bin` — concatenated unitig text + suffix array - `colors.drgn` — Roaring-bitmap colour index per unitig - `paths.bin` — per-genome unitig path (mmap-friendly v2 format on new builds) - `specificity.drgn` — per-genome private-unitig sets - `metadata.json` — version, k-mer size, genome count, total bases - `unitigs.fa` (optional) — keep for resume/auto-batch; safe to delete after successful build ### Index examples ```bash # Default: 31-mer, 4 threads, RAM-bounded only by the system dragon index -i genomes/ -o my_index/ # Low-memory: external-memory SA construction with 8 GB cap dragon index -i genomes/ -o my_index/ --low-memory --max-ram 8 # Use all cores dragon index -i genomes/ -o my_index/ -j $(nproc) # Auto-batch a million-genome collection into overlays dragon index -i giant_dir/ -o giant_idx/ --auto --max-ram 64 ``` ### Resume Index construction is resumable: if `fm_index.bin` and `colors.drgn` already exist in the output directory, Dragon skips Steps 1–4 (GGCAT + FM-index + colours) and resumes from Step 5 (path index). Useful when a job is killed during the long path-building step. --- ## `dragon search` Search query sequences against a Dragon index. Supports multi-shard search via repeatable `--shard` arguments — each shard is searched independently and results are merged with per-genome deduplication. ```text dragon search [OPTIONS] --index --query ``` ### Core options | Option | Short | Default | Description | | --- | --- | --- | --- | | `--index` | `-i` | *required* | Path to Dragon index directory | | `--shard` | | — | Additional shard directory (repeatable) | | `--query` | `-q` | *required* | Query FASTA/FASTQ file | | `--output` | `-o` | `-` (stdout) | Output file | | `--format` | `-f` | `paf` | `paf`, `blast6`, `summary`, or `gfa` | | `--threads` | `-j` | `4` | Number of threads | | `--max-ram` | | `4.0` | RAM budget in GB | | `--profile` | | `workstation` | `laptop` (≤8 GB, 4 threads) or `workstation` | ### Filtering & scoring | Option | Default | Description | | --- | --- | --- | | `--min-seed-len` | `15` | Minimum seed match length | | `--max-seed-freq` | `10000` | Skip seeds occurring more than this many times | | `--min-chain-score` | `50` | Minimum chain score to report | | `--max-target-seqs` | `10` | Maximum hits per query | | `--min-identity` | `0.7` | Minimum alignment identity (0.0–1.0) | | `--min-query-coverage` | `0.3` | Minimum query coverage (0.0–1.0) | | `--min-score-ratio` | `0.1` | Drop hits scoring below `ratio × best_score` | ### ML scoring & training | Option | Default | Description | | --- | --- | --- | | `--no-ml` | off | Disable learned seed scoring (use raw match length) | | `--ml-weights` | built-in | Path to a custom JSON of 7 scorer weights | | `--dump-seeds` | — | Dump every seed + features to TSV for ML training | | `--ground-truth` | — | Ground-truth genome name (with `--dump-seeds`) for labelled training data | | `--gfa-radius` | `5` | Number of unitig steps around each hit (used with `--format gfa`) | ### Search examples ```bash # Basic search dragon search -i my_index/ -q query.fa -o results.paf # Multi-shard against several species-level batches dragon search -i saureus_b1/ \ --shard saureus_b2/ --shard saureus_b3/ --shard kpneumo_b1/ \ -q amr_genes.fa -o hits.paf # Surveillance summary instead of PAF dragon search -i my_index/ -q amr_genes.fa --format summary # Laptop profile (clamps RAM and threads) dragon search -i my_index/ -q query.fa --profile laptop # Pipe through standard PAF tooling dragon search -i my_index/ -q query.fa | awk '$12 >= 30' > filtered.paf ``` --- ## `dragon info` Display index metadata. ```text dragon info --index ``` ### Example output ```text Dragon Index Information ======================== Version: 0.1.0 K-mer size: 31 Genomes: 32000 Unitigs: 9137000 Total bases: 434072471 Index size: 783.86 GB ``` --- ## `dragon download` Download genomes or a pre-built Dragon index. ```text dragon download [OPTIONS] --database --output ``` ### Supported databases | Name | Behaviour | | --- | --- | | `gtdb-r220` | Download a pre-built GTDB r220 representative-genomes index | | `allthebacteria-v2` | Download a pre-built AllTheBacteria v2 index | | `refseq-bacteria` | Download a pre-built RefSeq bacteria index | | `allthebacteria` | Download genomes from EBI AllTheBacteria, then build | | `refseq` | Download all RefSeq bacteria genomes, then build | | `refseq-representative` | Download only RefSeq representatives, then build | | `http(s)://...` | Custom URL to a pre-built index tarball | ### Example ```bash dragon download -d gtdb-r220 -o gtdb_r220_index/ ``` For the genome-download-and-build modes, the index construction step honours `--low-memory`, `--kmer-size`, `--threads`, and `--max-ram`. --- ## `dragon update` Add new genomes as a lightweight overlay without re-running GGCAT / FM-index construction. Queries automatically search both the base index and all overlays. ```text dragon update --index --genomes [--kmer-size 31] ``` When overlays exceed ~10 % of the base index, `dragon update` warns and recommends running `dragon compact`. --- ## `dragon compact` Merge the base index and all overlays back into one optimised index. Run after `dragon update` has accumulated significant overlay growth. ```text dragon compact --index --genomes [--kmer-size 31] ``` `--genomes` should point to **all** FASTA files (base + overlays), since `compact` rebuilds from scratch with the merged genome set. --- ## `dragon summarize` Generate a per-species surveillance summary from PAF output produced by `dragon search`. ```text dragon summarize --input [--output ] [--format tsv|json] [--index ] [--total-genomes ] ``` The summary contains, per species: - Prevalence (fraction of database genomes carrying the query) - Mean / min / max alignment identity - Number of unique sequence variants Designed for AMR-gene surveillance and other epidemiological queries. --- ## `dragon export-zarr` Export a Dragon index as a Zarr v3 store (chunked, Zstd-compressed). The original index is not modified. ```text dragon export-zarr --index --output ``` Store layout under `/`: ```text zarr.json root attrs (kmer_size, num_genomes, ...) text/ u8 unitig text, 1 MiB chunks, Zstd-3 suffix_array/ u64 SA, 131 072-entry chunks, Zstd-3 unitig_lengths/ u64 per-unitig lengths colors/offsets/ u64 byte offsets per unitig colors/bitmaps/ raw RoaringBitmap bytes, 1 MiB chunks, Zstd-3 ``` Anyone with `zarr-python` and `s3fs` can open the store from a public bucket without AWS credentials — the on-disk format is the on-cloud format. --- ## `dragon search-zarr` Pattern-search a Zarr-backed index. Reads only the chunks each query touches, making it cheap over remote object stores (S3, GCS) that expose HTTP range requests. ```text dragon search-zarr --zarr --query [--output ] ``` For full alignment use the binary backend (`dragon search`). `search-zarr` is intended as the cloud-native pattern-match demo: it returns matching text positions, the underlying unitig IDs, and the genomes carrying each unitig. --- ## `dragon signal-index` Build a signal-level index from genome FASTA files by converting expected nanopore current via a pore model and discretising into a finite alphabet. ```text dragon signal-index --input --output [OPTIONS] ``` | Option | Default | Description | | --- | --- | --- | | `--num-levels` | `16` | Discretisation alphabet size | | `--threads` | `4` | Threads | | `--signal-boundaries` | — | Path to learned discretisation boundaries (JSON) | | `--pore-model` | built-in R10.4.1 | Path to a custom pore model JSON | --- ## `dragon signal-search` Align raw nanopore current signals against a signal-level index. Inputs are TSV, CSV, or SLOW5 text format with auto-detection. ```text dragon signal-search --index --query [OPTIONS] ``` | Option | Default | Description | | --- | --- | --- | | `--signal-kmer-size` | `10` | Signal k-mer size for backward search | | `--min-hits` | `3` | Minimum k-mer hits to report a genome | | `--max-seed-freq` | `10000` | Skip signal k-mers above this frequency | | `--max-results` | `50` | Maximum results per query read | | `--threads` | `4` | Threads | --- ## Environment variables | Variable | Description | | --- | --- | | `RUST_LOG` | Logging level: `error`, `warn`, `info`, `debug`, `trace` | ```bash RUST_LOG=debug dragon search -i my_index/ -q query.fa RUST_LOG=warn dragon search -i my_index/ -q query.fa ```