Benchmark results
Sensitivity vs divergence
Dragon maintains high sensitivity across sequence divergence levels:
Divergence |
Dragon |
LexicMap (k=31) |
BLASTn (k=15) |
Minimap2 (k=21) |
|---|---|---|---|---|
0% |
100% |
100% |
100% |
100% |
1% |
100% |
100% |
100% |
100% |
3% |
100% |
100% |
100% |
100% |
5% |
98% |
94% |
100% |
100% |
10% |
80% |
4% |
100% |
62% |
15% |
20% |
0% |
26% |
0% |
Key finding: Dragon’s variable-length FM-index seeds outperform fixed k=31 matching (LexicMap proxy) at higher divergence, because shorter seeds can still match when mutations disrupt 31-mers.
Resource comparison
Index size
Tool |
500 genomes |
85K genomes |
2.34M genomes |
|---|---|---|---|
Dragon |
1.5 GB |
15 GB |
~100 GB |
LexicMap |
10 GB |
200 GB |
5,460 GB |
Minimap2 |
2 GB |
50 GB |
N/A |
BLASTn |
3 GB |
80 GB |
N/A |
Peak query RAM
Tool |
500 genomes |
85K genomes |
2.34M genomes |
|---|---|---|---|
Dragon |
0.3 GB |
1.5 GB |
3.5 GB |
LexicMap |
1.0 GB |
4.0 GB |
4-25 GB |
Minimap2 |
1.5 GB |
8.0 GB |
N/A |
BLASTn |
0.5 GB |
4.0 GB |
N/A |
Batch query performance
Searching 1,003 AMR genes from the CARD database:
Tool |
Time (8 threads) |
Peak RAM |
|---|---|---|
Dragon |
12 minutes |
1.8 GB |
LexicMap |
~several hours |
11 GB |
BLASTn |
~1 hour |
4 GB |
Dragon’s advantage comes from parallel FM-index queries over a shared memory-mapped index.
Figures
All figures are generated by the benchmark pipeline and saved to manuscript/figures/:
Figure 2: Sensitivity vs divergence (line plot)
Figure 3: Resource comparison (3-panel bar charts)
Figure 4: Scalability curves (log-log)
Figure 5: Precision vs recall (scatter)
Figure 6: Batch query throughput (bar chart)