Output formats

PAF (Pairwise Alignment Format)

Default output format. Tab-separated with 12 mandatory columns:

Column

Description

1

Query name

2

Query length

3

Query start (0-based)

4

Query end

5

Strand (+ or -)

6

Target genome name

7

Target genome length

8

Target start (0-based)

9

Target end

10

Number of matching bases

11

Alignment block length

12

Mapping quality (0-60)

Optional tags

Tag

Description

AS:i:<N>

Chain alignment score

cs:f:<F>

Query coverage fraction

Example

gene_001  1500  10  1490  +  genome_042  4800000  123456  124946  1450  1490  60  AS:i:2900  cs:f:0.9867

BLAST tabular (outfmt 6)

Use --format blast6. Tab-separated with 12 columns:

Column

Description

1

Query ID

2

Subject ID

3

% identity

4

Alignment length

5

Mismatches

6

Gap opens

7

Query start (1-based)

8

Query end

9

Subject start (1-based)

10

Subject end

11

E-value

12

Bit score

Example

gene_001  genome_042  96.67  1490  48  2  11  1490  123457  124946  0.00e+00  2900.0

Parsing output

Extract top hits per query

# PAF: best hit per query (highest mapping quality)
sort -k1,1 -k12,12rn results.paf | awk '!seen[$1]++' > best_hits.paf

# BLAST: best hit per query (highest bit score)
sort -k1,1 -k12,12rn results.tsv | awk '!seen[$1]++' > best_hits.tsv

Filter by identity

# PAF: filter by >90% identity (matches/alignment_length)
awk -F'\t' '$10/$11 > 0.90' results.paf > filtered.paf

# BLAST: filter directly on column 3
awk -F'\t' '$3 > 90' results.tsv > filtered.tsv

Count hits per genome

cut -f6 results.paf | sort | uniq -c | sort -rn | head -20