Benchmark results

Sensitivity vs divergence

Dragon maintains high sensitivity across sequence divergence levels:

Divergence

Dragon

LexicMap (k=31)

BLASTn (k=15)

Minimap2 (k=21)

0%

100%

100%

100%

100%

1%

100%

100%

100%

100%

3%

100%

100%

100%

100%

5%

98%

94%

100%

100%

10%

80%

4%

100%

62%

15%

20%

0%

26%

0%

Key finding: Dragon’s variable-length FM-index seeds outperform fixed k=31 matching (LexicMap proxy) at higher divergence, because shorter seeds can still match when mutations disrupt 31-mers.

Resource comparison

Index size

Tool

500 genomes

85K genomes

2.34M genomes

Dragon

1.5 GB

15 GB

~100 GB

LexicMap

10 GB

200 GB

5,460 GB

Minimap2

2 GB

50 GB

N/A

BLASTn

3 GB

80 GB

N/A

Peak query RAM

Tool

500 genomes

85K genomes

2.34M genomes

Dragon

0.3 GB

1.5 GB

3.5 GB

LexicMap

1.0 GB

4.0 GB

4-25 GB

Minimap2

1.5 GB

8.0 GB

N/A

BLASTn

0.5 GB

4.0 GB

N/A

Batch query performance

Searching 1,003 AMR genes from the CARD database:

Tool

Time (8 threads)

Peak RAM

Dragon

12 minutes

1.8 GB

LexicMap

~several hours

11 GB

BLASTn

~1 hour

4 GB

Dragon’s advantage comes from parallel FM-index queries over a shared memory-mapped index.

Figures

All figures are generated by the benchmark pipeline and saved to manuscript/figures/:

  • Figure 2: Sensitivity vs divergence (line plot)

  • Figure 3: Resource comparison (3-panel bar charts)

  • Figure 4: Scalability curves (log-log)

  • Figure 5: Precision vs recall (scatter)

  • Figure 6: Batch query throughput (bar chart)