Benchmark methodology

Tools compared

Tool	Type	Version	Why included
Dragon	Graph + FM-index	0.1.0	Our tool
LexicMap	LexicHash probes	latest	Direct predecessor (Nature Biotech 2025)
Minimap2	Minimiser + chain	2.28	Gold standard for long-read alignment
BLASTn	Word + extend	2.15.0	Gold standard for sensitivity
MMseqs2	k-mer prefilter	15	Fast protein/nucleotide search
COBS	Bit-sliced signatures	latest	k-mer containment search
sourmash	FracMinHash sketches	4.8	Containment estimation
skani	Sparse chaining ANI	0.2	Fast ANI estimation

Accuracy metrics

All metrics computed against ground truth from simulation:

Sensitivity (recall): TP / (TP + FN) — fraction of true hits found
Precision: TP / (TP + FP) — fraction of reported hits that are correct
F1 score: harmonic mean of precision and recall
Alignment identity error: |reported_identity - true_identity|

Metrics are stratified by:

Divergence level (0-15%)
Query length (short/medium/long)

Resource metrics

Peak RAM: maximum resident set size via getrusage() or /usr/bin/time -v
Wall-clock time: elapsed time per query
CPU time: user + system time
Index size on disk: total bytes of all index files
Index construction time: wall-clock time for dragon index

Scalability metrics

RAM and time vs number of indexed genomes (100, 1K, 10K, 100K, 1M, 2M)
RAM and time vs batch size (1, 10, 100, 1000 queries)
Performance on HDD vs SSD

Read simulation

Gene-level queries

# Extract random genes
python3 benchmark/simulate/extract_genes.py \
  --genome-dir data/genomes/ \
  --output queries/genes.fa \
  --num-genes 1000 --min-length 500 --max-length 5000

# Introduce mutations
python3 benchmark/simulate/mutate_sequences.py \
  --input queries/genes.fa \
  --output queries/genes_div0.05.fa \
  --divergence 0.05

Long reads (Badread)

bash benchmark/simulate/run_badread.sh data/genomes/ queries/long_reads.fa

Statistical analysis

Each measurement repeated 3 times; median reported
Error bars show min/max across replicates
Statistical significance tested with Wilcoxon signed-rank test where applicable