Dragon Documentation
A cloud-native, signal-aware aligner for surveillance-scale microbial genomics.
Dragon aligns query sequences (genes, plasmids, long/short reads, raw nanopore current) against millions of prokaryotic genomes while using dramatically less disk and RAM than existing tools. It exploits redundancy among related genomes through a coloured compacted de Bruijn graph, an FM-index over concatenated unitigs, ML-weighted graph-aware chaining, and a streaming on-disk format that mmaps the index in O(1).
Key features
~50× less disk than LexicMap (~100 GB vs 5.46 TB for 2.34 M genomes).
<4 GB query RAM at million-genome scale;
--profile laptopfurther restricts use to consumer hardware.Multi-shard search (
--shard) for indices split across files or quotas.Cloud-native Zarr v3 backend (
dragon export-zarr/dragon search-zarr) — chunked + Zstd-compressed; reads run againsts3://orgs://directly viazarr-python.Mmap-friendly
paths.bin v2— O(1) cold-load, per-genome lazy decoding from a fixed offset table.Raw nanopore signal search (
dragon signal-index/dragon signal-search) — pore-model–driven discretisation indexed by the same FM-index machinery, no basecalling required.ML-weighted seed scoring — logistic regression over six anchor features; pure Rust inference.
Surveillance-ready summaries (
dragon summarize,--format summary) — per-species prevalence + identity tables built into the CLI.Incremental updates (
dragon update/dragon compact) — overlay new genomes without a full rebuild.Variable-length seeds via FM-index backward search.
Outputs in PAF, BLAST-tabular, surveillance summary, and graph-context GFA formats.
A 16,000-genome demo index is hosted at s3://dragon-zarr/saureus/b1/ (eu-west-2, public-read). Anyone can read it with no AWS credentials:
pip install 'zarr>=3.0' s3fs numcodecs
python scripts/zarr_demo.py s3://dragon-zarr/saureus/b1
Contents
Getting Started
User Guide
Architecture
API Reference
Citation
Cerdeira, L. (2026). Dragon: a cloud-native, signal-aware aligner for surveillance-scale microbial genomics. In preparation.
Licence
Dragon is released under the MIT Licence.