# Module reference ## Index modules (`src/index/`) ### `index::dbg` De Bruijn graph construction via GGCAT or internal builder. - `build_cdbg(genome_dir, output_dir, kmer_size, threads)` — Build a coloured compacted de Bruijn graph. Uses GGCAT if available, otherwise falls back to internal builder. - `DbgResult` — Result struct containing paths to unitig and colour files. ### `index::unitig` Unitig parsing and 2-bit encoding. - `parse_and_encode_unitigs(path)` — Parse a unitig FASTA file and encode sequences. - `UnitigSet` — Collection of all unitigs with concatenated text and length metadata. - `Unitig` — Single unitig with ID and 2-bit packed sequence. ### `index::color` Roaring Bitmap colour index. - `build_color_index(color_file, output_dir, num_genomes)` — Build and serialise the colour index. - `load_color_index(index_dir)` — Load colour index via memory mapping. - `ColorIndex::get_colors(unitig_id)` — Look up which genomes contain a unitig. ### `index::fm` FM-index construction and querying. - `build_fm_index(unitigs, output_dir)` — Build FM-index from a UnitigSet. - `load_fm_index(index_dir)` — Load FM-index from disk. - `DragonFmIndex::search(pattern)` — Find all occurrences of a pattern. - `DragonFmIndex::count(pattern)` — Count occurrences without locating. - `DragonFmIndex::variable_length_search(pattern)` — Extend search to maximum match length. ### `index::paths` Genome path index. - `build_path_index(genome_dir, unitigs, output_dir)` — Build path index from genomes. - `load_path_index(index_dir)` — Load path index from disk. - `PathIndex::extract_sequence(genome_id, start, end, unitigs)` — Reconstruct a genome region. ### `index::metadata` Index statistics and metadata. - `write_metadata(output_dir, dbg_result, unitigs)` — Write metadata JSON. - `load_metadata(index_dir)` — Load metadata. --- ## Query modules (`src/query/`) ### `query::seed` FM-index seed finding. - `find_seeds(query, fm_index, min_seed_len, max_freq)` — Find all seeds in a query using backward search with variable-length extension. Searches both forward and reverse complement. ### `query::candidate` Candidate genome filtering. - `find_candidates(seeds, color_index, min_votes)` — Identify genomes sharing unitigs with query seeds. Returns candidates sorted by vote count. ### `query::chain` Colinear chaining. - `chain_candidates(seeds, candidates, path_index, min_score)` — Compute optimal colinear chains for each candidate genome using Fenwick tree DP. - `Chain` — A scored chain of colinear anchors with coverage information. ### `query::align` Wavefront alignment. - `align_chains(query, chains, path_index)` — Align chains and produce PAF records. - `banded_nw_align(query, reference, bandwidth)` — Banded Needleman-Wunsch alignment. --- ## Data structures (`src/ds/`) ### `ds::fenwick` - `FenwickMax` — Prefix maximum queries in O(log n). - `FenwickSum` — Prefix sum queries in O(log n). ### `ds::elias_fano` - `CumulativeLengthIndex` — Maps text positions to unitig IDs via binary search on cumulative lengths. ### `ds::varint` - `encode_varint / decode_varint` — LEB128 variable-length integer encoding. - `encode_zigzag / decode_zigzag` — Zigzag encoding for signed integers. - `delta_encode / delta_decode` — Delta + varint encoding for sorted sequences. --- ## Utilities (`src/util/`) ### `util::dna` - `PackedSequence` — 2-bit packed DNA sequence (32 bases per u64). - `canonical_kmer(kmer, k)` — Lexicographically smaller of forward and reverse complement. ### `util::mmap` - `mmap_open(path)` — Memory-map a file for read-only access. - `read_bincode / write_bincode` — Serialise/deserialise via bincode. --- ## I/O modules (`src/io/`) ### `io::fasta` - `read_sequences(path)` — Read all sequences from a FASTA file. - `FastaReader` — Streaming iterator over FASTA records. - `list_fasta_files(dir)` — List FASTA files in a directory. ### `io::paf` - `PafRecord` — PAF alignment record with Display formatting. - `write_paf(writer, records)` — Write PAF records. ### `io::blast` - `write_blast_tabular(writer, records)` — Write BLAST outfmt 6 records.