Benchmark datasets

Tiered approach

Dragon is benchmarked at three scales to validate scalability from laptop to server:

Property	Value
Genomes	500 complete E. coli / Shigella from RefSeq
Total sequence	~2.5 Gbp
Redundancy	High (~95% ANI within species)
Index time	<1 hour
Index size	~1.5 GB
Use case	Unit testing, CI, rapid iteration

Property	Value
Genomes	~85,000 GTDB r220 representative genomes
Total sequence	~250 Gbp
Redundancy	Medium (one genome per species)
Index time	~1 hour
Index size	~15 GB
Use case	Sensitivity/accuracy validation

Property	Value
Genomes	~2.34M GenBank + RefSeq prokaryotic assemblies
Total sequence	~10 Tbp
Redundancy	Very high (many strains per species)
Index time	~12 hours
Index size	~100 GB
Use case	Full-scale comparison with LexicMap