Dragon

Getting Started

  • Installation
    • Requirements
    • From source (recommended)
      • 1. Install Rust
      • 2. Clone and build
      • 3. (Optional) Install system-wide
      • 4. Verify installation
    • Optional dependencies
      • GGCAT (recommended for large databases)
      • Cloud-native (Zarr) dependencies
      • Benchmark dependencies
    • Troubleshooting
      • sux crate build failure
      • Memory issues during index construction
  • Quick start
    • Step 1: Prepare genome files
    • Step 2: Build the index
    • Step 3: Search
    • Step 4: Inspect results
    • Example output
    • Step 5 (optional): Multi-shard search
    • Step 6 (optional): Cloud-native deployment
    • Step 7 (optional): Surveillance summary
  • Tutorial: AMR gene search
    • Scenario
    • Step 1: Download test data
    • Step 2: Build index
    • Step 3: Search AMR genes
    • Step 4: BLAST-tabular output
    • Step 5: Batch analysis
    • Performance notes

User Guide

  • Indexing
    • Overview
    • Command
      • Required arguments
      • Optional arguments
    • Input format
    • Choosing k-mer size
    • Index files
    • Resource requirements
    • GGCAT integration
  • Searching
    • Overview
    • Command
      • Required arguments
      • Optional arguments
    • Query types
    • Seed finding details
    • Candidate filtering
    • Chaining
    • Memory management
  • Output formats
    • PAF (Pairwise Alignment Format)
      • Optional tags
      • Example
    • BLAST tabular (outfmt 6)
      • Example
    • Parsing output
      • Extract top hits per query
      • Filter by identity
      • Count hits per genome
  • Performance tuning
    • Hardware recommendations
    • SSD vs HDD
    • Tuning parameters
      • For maximum sensitivity
      • For maximum speed
      • For low-memory machines (4 GB total RAM)
    • Scaling guidelines
      • Number of genomes vs resources
      • Query length vs performance
    • Distributing pre-built indices

Architecture

  • Architecture overview
    • The redundancy problem
    • Pipeline overview
      • Index construction
      • Query pipeline
    • Why this is efficient
    • Module map
  • Coloured compacted de Bruijn graph
    • What is a de Bruijn graph?
    • Why use a de Bruijn graph?
    • Construction
      • GGCAT advantages
      • Fallback builder
    • Colour storage
  • Run-length FM-index
    • Background
    • Why run-length?
    • Construction
    • Backward search
    • Variable-length seed matching
    • Position-to-unitig mapping
  • Graph-aware colinear chaining
    • The chaining problem
    • Why graph-aware?
    • Algorithm
      • Step 1: Map seeds to genome coordinates
      • Step 2: Sort anchors by reference position
      • Step 3: Fenwick tree DP
      • Step 4: Gap-sensitive scoring
    • Complexity
  • Data structures
    • 2-bit DNA encoding
    • Roaring Bitmaps
    • Elias-Fano cumulative length index
    • Fenwick tree (Binary Indexed Tree)
    • Variable-length integers (varint)

Benchmark

  • Benchmark datasets
    • Tiered approach
      • Tier 1: Small (development & CI)
      • Tier 2: Medium (validation)
      • Tier 3: Large (full benchmark)
    • Query types
      • Gene-level queries (primary)
      • Long reads (Badread)
      • Challenging scenarios
  • Benchmark methodology
    • Tools compared
    • Accuracy metrics
    • Resource metrics
    • Scalability metrics
    • Read simulation
      • Gene-level queries
      • Long reads (Badread)
    • Statistical analysis
  • Benchmark results
    • Sensitivity vs divergence
    • Resource comparison
      • Index size
      • Peak query RAM
    • Batch query performance
    • Figures
  • Reproducing benchmarks
    • Quick start (synthetic data)
    • Full benchmark (Snakemake)
      • Prerequisites
      • Running
      • Configuration
    • Pipeline structure
    • Adding a new tool

API Reference

  • CLI reference
    • dragon index
      • Index examples
      • Resume
    • dragon search
      • Core options
      • Filtering & scoring
      • ML scoring & training
      • Search examples
    • dragon info
      • Example output
    • dragon download
      • Supported databases
      • Example
    • dragon update
    • dragon compact
    • dragon summarize
    • dragon export-zarr
    • dragon search-zarr
    • dragon signal-index
    • dragon signal-search
    • Environment variables
  • Module reference
    • Index modules (src/index/)
      • index::dbg
      • index::unitig
      • index::color
      • index::fm
      • index::paths
      • index::metadata
    • Query modules (src/query/)
      • query::seed
      • query::candidate
      • query::chain
      • query::align
    • Data structures (src/ds/)
      • ds::fenwick
      • ds::elias_fano
      • ds::varint
    • Utilities (src/util/)
      • util::dna
      • util::mmap
    • I/O modules (src/io/)
      • io::fasta
      • io::paf
      • io::blast
Dragon
  • Search


© Copyright 2026, Louise Cerdeira.

Built with Sphinx using a theme provided by Read the Docs.