Bracken

Extension (optional)

Bracken (Bayesian Reestimation of Abundance with KrakEN) uses taxonomy labels assigned by Kraken2 to compute estimated abundances of taxa in a metagenomic sample.

1 Running Bracken

As with Krona, we can use the Kraken2 report files to run Bracken.

bracken -d ~/kraken2_db/\
  -i K1.kreport2 \
  -o K1.bracken \
  -w K1.breport2 \
  -r 100 \
  -l S \
  -t 5

1.1 Parameters

-d
Specifies the Kraken2 database used for classification.
Bracken requires the full path to the database; here we use the KRAKEN2_DB_PATH environment variable.
-i
Kraken2 report file used as input.
-o
Output Bracken file.
-w
Output report file in Kraken-style format.
This file is required for downstream analysis in R (e.g. using phyloseq).
-r 100
Read length used during Kraken2 classification.
We use 100 bp as the original library consisted of paired-end 100 bp reads.
-l S
Taxonomic rank for abundance estimation.
Other valid options include D, P, C, O, F, and G.
-t 5
Minimum number of reads required for classification at the specified rank.

2 Bracken output

The main Bracken output file contains the following columns:

Name – Taxon name at the specified rank
Taxonomy ID – NCBI taxonomy identifier
Level ID – Taxonomic rank code
Kraken-assigned reads
Reads added by abundance reestimation
Total reads after abundance reestimation
Fraction of total reads

The total reads after abundance reestimation are typically used for downstream analyses.

3 Task

Repeat the Bracken command for samples K2 and W1.

# K2
bracken -d ~/kraken2_db/ \
  -i K2.kreport2 \
  -o K2.bracken \
  -r 100 \
  -l S \
  -t 5

# W1
bracken -d ~/kraken2_db/ \
  -i W1.kreport2 \
  -o W1.bracken \
  -r 100 \
  -l S \
  -t 5

4 Merging Bracken outputs

To combine Bracken results across samples, first copy the additional Bracken output files provided for the workshop.

https://cgr.liv.ac.uk/454/acdarby/LIFE750/bracken/

Merge all K and W Bracken files into a single table.

combine_bracken_outputs.py --files [KW]*.bracken -o all.bracken

The merged output contains:

name – Taxon name
taxonomy_id – NCBI taxonomy ID
taxonomy_lvl – Taxonomic rank

For each sample, two additional columns are included:

${Sample}.bracken_num – Reads after abundance reestimation
${Sample}.bracken_frac – Relative abundance

5 Extracting abundance columns

To create a table containing only organism names and abundance counts, we first generate a sequence of column indices corresponding to the bracken_num columns.

seq -s , 4 2 50
bracken_num_columns=$(seq -s , 4 2 50)
echo $bracken_num_columns

Extract the first column and all bracken_num columns.

cut -f 1,$bracken_num_columns all.bracken > all_num.bracken

then less this file to see the contents