# Performance tuning ## Hardware recommendations | Component | Minimum | Recommended | Notes | |-----------|---------|-------------|-------| | RAM (query) | 4 GB | 8 GB | Dragon stays <4 GB; extra for OS | | RAM (index build) | 8 GB | 32-64 GB | SA-IS construction | | Storage | SSD | NVMe SSD | Memory-mapped access; HDD 5-10x slower | | CPU | 2 cores | 8+ cores | Parallel queries via rayon | ## SSD vs HDD Dragon's index is memory-mapped, so I/O speed directly affects performance: | Storage | Single gene query | Batch (1000 genes) | |---------|-------------------|-------------------| | NVMe SSD | ~1 second | ~5 minutes | | SATA SSD | ~3 seconds | ~15 minutes | | HDD | ~15 seconds | ~2 hours | **Recommendation**: use SSD whenever possible. If using HDD, increase `--max-ram` to keep more index pages resident. ## Tuning parameters ### For maximum sensitivity ```bash dragon search \ --min-seed-len 12 \ --max-seed-freq 50000 \ --min-chain-score 20 \ --max-target-seqs 500 ``` ### For maximum speed ```bash dragon search \ --min-seed-len 21 \ --max-seed-freq 1000 \ --min-chain-score 100 \ --max-target-seqs 10 \ --threads 16 ``` ### For low-memory machines (4 GB total RAM) ```bash dragon search \ --max-ram 2.0 \ --threads 2 ``` ## Scaling guidelines ### Number of genomes vs resources | Genomes | Index disk | Query RAM | Build time | |---------|-----------|-----------|------------| | 100 | 200 MB | <500 MB | 5 seconds | | 1,000 | 1 GB | <500 MB | 30 seconds | | 10,000 | 5 GB | 1 GB | 10 minutes | | 100,000 | 20 GB | 2 GB | 2 hours | | 1,000,000 | 60 GB | 3 GB | 8 hours | | 2,340,000 | 100 GB | 3.5 GB | 12 hours | ### Query length vs performance | Query length | Seeds found | Chaining time | Total time | |-------------|-------------|---------------|------------| | 150 bp | ~5 | <1 ms | ~0.1 s | | 1,000 bp | ~30 | ~5 ms | ~0.5 s | | 10,000 bp | ~300 | ~50 ms | ~2 s | | 100,000 bp | ~3,000 | ~500 ms | ~15 s | ## Distributing pre-built indices Since index construction is expensive but only done once, consider: 1. **Build once on a server** with sufficient RAM 2. **Distribute the index** to query machines (e.g., via rsync, S3, or shared filesystem) 3. **Query on laptops** with just 4 GB RAM ```bash # On build server dragon index -i all_genomes/ -o dragon_index/ -j 32 # Transfer to query machine rsync -avP dragon_index/ user@laptop:~/dragon_index/ # On laptop dragon search -i ~/dragon_index/ -q my_query.fa -o results.paf ```