# Installation

## Requirements

- **Rust** 1.75 or later
- **Operating system**: Linux, macOS, or Windows (WSL)
- **RAM**: 4 GB minimum for querying; 8+ GB recommended for index construction
- **Disk**: depends on database size (see [Performance tuning](../user-guide/performance-tuning.md))

## From source (recommended)

### 1. Install Rust

```bash
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
source $HOME/.cargo/env
```

### 2. Clone and build

```bash
git clone https://github.com/lcerdeira/dragon.git
cd dragon
cargo build --release
```

The compiled binary is at `target/release/dragon`.

### 3. (Optional) Install system-wide

```bash
cargo install --path .
# or manually:
cp target/release/dragon /usr/local/bin/
```

### 4. Verify installation

```bash
dragon --version
dragon --help
```

## Optional dependencies

### GGCAT (recommended for large databases)

[GGCAT](https://github.com/algbio/ggcat) provides optimised coloured compacted de Bruijn graph construction. Dragon will use it automatically if found in `PATH`.

```bash
# Build from source (GGCAT is not on crates.io)
git clone https://github.com/algbio/ggcat.git
cd ggcat
cargo build --release
cp target/release/ggcat ~/.cargo/bin/   # or anywhere on $PATH
```

Without GGCAT, Dragon uses a built-in graph builder that works well for datasets up to ~10,000 genomes.

### Cloud-native (Zarr) dependencies

To read Dragon Zarr stores from Python (local paths or `s3://` / `gs://`):

```bash
pip install 'zarr>=3.0' s3fs gcsfs numcodecs
```

A 16,000-genome demo store lives at `s3://dragon-zarr/saureus/b1/` (eu-west-2, public-read; no AWS credentials required).

### Benchmark dependencies

The benchmark pipeline now lives in the companion repository `lcerdeira/dragon-private` (private until publication). If you have access:

```bash
git clone https://github.com/lcerdeira/dragon-private.git
pip install snakemake matplotlib seaborn pandas numpy
```

## Troubleshooting

### `sux` crate build failure

On Rust 1.94+, the `sux` crate may fail due to `common_traits` ambiguity. Dragon does not depend on `sux` — it uses an internal Elias-Fano implementation. If you encounter this error, ensure your `Cargo.toml` does not list `sux` as a dependency.

### Memory issues during index construction

For very large databases (>100K genomes), index construction may require significant RAM for suffix array construction. Consider:

1. Using a machine with 32+ GB RAM for the one-time index build
2. Distributing the pre-built index to query machines
3. Using `--kmer-size 21` to reduce index memory (at some sensitivity cost)