Output formats
PAF (Pairwise Alignment Format)
Default output format. Tab-separated with 12 mandatory columns:
Column |
Description |
|---|---|
1 |
Query name |
2 |
Query length |
3 |
Query start (0-based) |
4 |
Query end |
5 |
Strand ( |
6 |
Target genome name |
7 |
Target genome length |
8 |
Target start (0-based) |
9 |
Target end |
10 |
Number of matching bases |
11 |
Alignment block length |
12 |
Mapping quality (0-60) |
Example
gene_001 1500 10 1490 + genome_042 4800000 123456 124946 1450 1490 60 AS:i:2900 cs:f:0.9867
BLAST tabular (outfmt 6)
Use --format blast6. Tab-separated with 12 columns:
Column |
Description |
|---|---|
1 |
Query ID |
2 |
Subject ID |
3 |
% identity |
4 |
Alignment length |
5 |
Mismatches |
6 |
Gap opens |
7 |
Query start (1-based) |
8 |
Query end |
9 |
Subject start (1-based) |
10 |
Subject end |
11 |
E-value |
12 |
Bit score |
Example
gene_001 genome_042 96.67 1490 48 2 11 1490 123457 124946 0.00e+00 2900.0
Parsing output
Extract top hits per query
# PAF: best hit per query (highest mapping quality)
sort -k1,1 -k12,12rn results.paf | awk '!seen[$1]++' > best_hits.paf
# BLAST: best hit per query (highest bit score)
sort -k1,1 -k12,12rn results.tsv | awk '!seen[$1]++' > best_hits.tsv
Filter by identity
# PAF: filter by >90% identity (matches/alignment_length)
awk -F'\t' '$10/$11 > 0.90' results.paf > filtered.paf
# BLAST: filter directly on column 3
awk -F'\t' '$3 > 90' results.tsv > filtered.tsv
Count hits per genome
cut -f6 results.paf | sort | uniq -c | sort -rn | head -20