Bracken

Extension (optional)

Bracken (Bayesian Reestimation of Abundance with KrakEN) uses taxonomy labels assigned by Kraken2 to compute estimated abundances of taxa in a metagenomic sample.

1 Running Bracken

As with Krona, we can use the Kraken2 report files to run Bracken.

bracken -d ~/kraken2_db/\
  -i K1.kreport2 \
  -o K1.bracken \
  -w K1.breport2 \
  -r 100 \
  -l S \
  -t 5

1.1 Parameters

  • -d
    Specifies the Kraken2 database used for classification.
    Bracken requires the full path to the database; here we use the KRAKEN2_DB_PATH environment variable.

  • -i
    Kraken2 report file used as input.

  • -o
    Output Bracken file.

  • -w
    Output report file in Kraken-style format.
    This file is required for downstream analysis in R (e.g. using phyloseq).

  • -r 100
    Read length used during Kraken2 classification.
    We use 100 bp as the original library consisted of paired-end 100 bp reads.

  • -l S
    Taxonomic rank for abundance estimation.
    Other valid options include D, P, C, O, F, and G.

  • -t 5
    Minimum number of reads required for classification at the specified rank.

2 Bracken output

The main Bracken output file contains the following columns:

  1. Name – Taxon name at the specified rank
  2. Taxonomy ID – NCBI taxonomy identifier
  3. Level ID – Taxonomic rank code
  4. Kraken-assigned reads
  5. Reads added by abundance reestimation
  6. Total reads after abundance reestimation
  7. Fraction of total reads

The total reads after abundance reestimation are typically used for downstream analyses.

3 Task

Repeat the Bracken command for samples K2 and W1.

# K2
bracken -d ~/kraken2_db/ \
  -i K2.kreport2 \
  -o K2.bracken \
  -r 100 \
  -l S \
  -t 5

# W1
bracken -d ~/kraken2_db/ \
  -i W1.kreport2 \
  -o W1.bracken \
  -r 100 \
  -l S \
  -t 5

4 Merging Bracken outputs

To combine Bracken results across samples, first copy the additional Bracken output files provided for the workshop.

https://cgr.liv.ac.uk/454/acdarby/LIFE750/bracken/

Merge all K and W Bracken files into a single table.

combine_bracken_outputs.py --files [KW]*.bracken -o all.bracken

The merged output contains:

  • name – Taxon name
  • taxonomy_id – NCBI taxonomy ID
  • taxonomy_lvl – Taxonomic rank

For each sample, two additional columns are included:

  • ${Sample}.bracken_num – Reads after abundance reestimation
  • ${Sample}.bracken_frac – Relative abundance

5 Extracting abundance columns

To create a table containing only organism names and abundance counts, we first generate a sequence of column indices corresponding to the bracken_num columns.

seq -s , 4 2 50
bracken_num_columns=$(seq -s , 4 2 50)
echo $bracken_num_columns

Extract the first column and all bracken_num columns.

cut -f 1,$bracken_num_columns all.bracken > all_num.bracken

then less this file to see the contents