Benchmark methodology

Tools compared

Tool

Type

Version

Why included

Dragon

Graph + FM-index

0.1.0

Our tool

LexicMap

LexicHash probes

latest

Direct predecessor (Nature Biotech 2025)

Minimap2

Minimiser + chain

2.28

Gold standard for long-read alignment

BLASTn

Word + extend

2.15.0

Gold standard for sensitivity

MMseqs2

k-mer prefilter

15

Fast protein/nucleotide search

COBS

Bit-sliced signatures

latest

k-mer containment search

sourmash

FracMinHash sketches

4.8

Containment estimation

skani

Sparse chaining ANI

0.2

Fast ANI estimation

Accuracy metrics

All metrics computed against ground truth from simulation:

  • Sensitivity (recall): TP / (TP + FN) — fraction of true hits found

  • Precision: TP / (TP + FP) — fraction of reported hits that are correct

  • F1 score: harmonic mean of precision and recall

  • Alignment identity error: |reported_identity - true_identity|

Metrics are stratified by:

  • Divergence level (0-15%)

  • Query length (short/medium/long)

Resource metrics

  • Peak RAM: maximum resident set size via getrusage() or /usr/bin/time -v

  • Wall-clock time: elapsed time per query

  • CPU time: user + system time

  • Index size on disk: total bytes of all index files

  • Index construction time: wall-clock time for dragon index

Scalability metrics

  • RAM and time vs number of indexed genomes (100, 1K, 10K, 100K, 1M, 2M)

  • RAM and time vs batch size (1, 10, 100, 1000 queries)

  • Performance on HDD vs SSD

Read simulation

Gene-level queries

# Extract random genes
python3 benchmark/simulate/extract_genes.py \
  --genome-dir data/genomes/ \
  --output queries/genes.fa \
  --num-genes 1000 --min-length 500 --max-length 5000

# Introduce mutations
python3 benchmark/simulate/mutate_sequences.py \
  --input queries/genes.fa \
  --output queries/genes_div0.05.fa \
  --divergence 0.05

Long reads (Badread)

bash benchmark/simulate/run_badread.sh data/genomes/ queries/long_reads.fa

Statistical analysis

  • Each measurement repeated 3 times; median reported

  • Error bars show min/max across replicates

  • Statistical significance tested with Wilcoxon signed-rank test where applicable