Module reference

Index modules (src/index/)

index::dbg

De Bruijn graph construction via GGCAT or internal builder.

  • build_cdbg(genome_dir, output_dir, kmer_size, threads) — Build a coloured compacted de Bruijn graph. Uses GGCAT if available, otherwise falls back to internal builder.

  • DbgResult — Result struct containing paths to unitig and colour files.

index::unitig

Unitig parsing and 2-bit encoding.

  • parse_and_encode_unitigs(path) — Parse a unitig FASTA file and encode sequences.

  • UnitigSet — Collection of all unitigs with concatenated text and length metadata.

  • Unitig — Single unitig with ID and 2-bit packed sequence.

index::color

Roaring Bitmap colour index.

  • build_color_index(color_file, output_dir, num_genomes) — Build and serialise the colour index.

  • load_color_index(index_dir) — Load colour index via memory mapping.

  • ColorIndex::get_colors(unitig_id) — Look up which genomes contain a unitig.

index::fm

FM-index construction and querying.

  • build_fm_index(unitigs, output_dir) — Build FM-index from a UnitigSet.

  • load_fm_index(index_dir) — Load FM-index from disk.

  • DragonFmIndex::search(pattern) — Find all occurrences of a pattern.

  • DragonFmIndex::count(pattern) — Count occurrences without locating.

  • DragonFmIndex::variable_length_search(pattern) — Extend search to maximum match length.

index::paths

Genome path index.

  • build_path_index(genome_dir, unitigs, output_dir) — Build path index from genomes.

  • load_path_index(index_dir) — Load path index from disk.

  • PathIndex::extract_sequence(genome_id, start, end, unitigs) — Reconstruct a genome region.

index::metadata

Index statistics and metadata.

  • write_metadata(output_dir, dbg_result, unitigs) — Write metadata JSON.

  • load_metadata(index_dir) — Load metadata.


Query modules (src/query/)

query::seed

FM-index seed finding.

  • find_seeds(query, fm_index, min_seed_len, max_freq) — Find all seeds in a query using backward search with variable-length extension. Searches both forward and reverse complement.

query::candidate

Candidate genome filtering.

  • find_candidates(seeds, color_index, min_votes) — Identify genomes sharing unitigs with query seeds. Returns candidates sorted by vote count.

query::chain

Colinear chaining.

  • chain_candidates(seeds, candidates, path_index, min_score) — Compute optimal colinear chains for each candidate genome using Fenwick tree DP.

  • Chain — A scored chain of colinear anchors with coverage information.

query::align

Wavefront alignment.

  • align_chains(query, chains, path_index) — Align chains and produce PAF records.

  • banded_nw_align(query, reference, bandwidth) — Banded Needleman-Wunsch alignment.


Data structures (src/ds/)

ds::fenwick

  • FenwickMax — Prefix maximum queries in O(log n).

  • FenwickSum — Prefix sum queries in O(log n).

ds::elias_fano

  • CumulativeLengthIndex — Maps text positions to unitig IDs via binary search on cumulative lengths.

ds::varint

  • encode_varint / decode_varint — LEB128 variable-length integer encoding.

  • encode_zigzag / decode_zigzag — Zigzag encoding for signed integers.

  • delta_encode / delta_decode — Delta + varint encoding for sorted sequences.


Utilities (src/util/)

util::dna

  • PackedSequence — 2-bit packed DNA sequence (32 bases per u64).

  • canonical_kmer(kmer, k) — Lexicographically smaller of forward and reverse complement.

util::mmap

  • mmap_open(path) — Memory-map a file for read-only access.

  • read_bincode / write_bincode — Serialise/deserialise via bincode.


I/O modules (src/io/)

io::fasta

  • read_sequences(path) — Read all sequences from a FASTA file.

  • FastaReader — Streaming iterator over FASTA records.

  • list_fasta_files(dir) — List FASTA files in a directory.

io::paf

  • PafRecord — PAF alignment record with Display formatting.

  • write_paf(writer, records) — Write PAF records.

io::blast

  • write_blast_tabular(writer, records) — Write BLAST outfmt 6 records.