Bracken
Extension (optional)
Bracken (Bayesian Reestimation of Abundance with KrakEN) uses taxonomy labels assigned by Kraken2 to compute estimated abundances of taxa in a metagenomic sample.
1 Running Bracken
As with Krona, we can use the Kraken2 report files to run Bracken.
bracken -d ~/kraken2_db/\
-i K1.kreport2 \
-o K1.bracken \
-w K1.breport2 \
-r 100 \
-l S \
-t 51.1 Parameters
-d
Specifies the Kraken2 database used for classification.
Bracken requires the full path to the database; here we use theKRAKEN2_DB_PATHenvironment variable.-i
Kraken2 report file used as input.-o
Output Bracken file.-w
Output report file in Kraken-style format.
This file is required for downstream analysis in R (e.g. usingphyloseq).-r 100
Read length used during Kraken2 classification.
We use 100 bp as the original library consisted of paired-end 100 bp reads.-l S
Taxonomic rank for abundance estimation.
Other valid options includeD,P,C,O,F, andG.-t 5
Minimum number of reads required for classification at the specified rank.
2 Bracken output
The main Bracken output file contains the following columns:
- Name – Taxon name at the specified rank
- Taxonomy ID – NCBI taxonomy identifier
- Level ID – Taxonomic rank code
- Kraken-assigned reads
- Reads added by abundance reestimation
- Total reads after abundance reestimation
- Fraction of total reads
The total reads after abundance reestimation are typically used for downstream analyses.
3 Task
Repeat the Bracken command for samples K2 and W1.
# K2
bracken -d ~/kraken2_db/ \
-i K2.kreport2 \
-o K2.bracken \
-r 100 \
-l S \
-t 5
# W1
bracken -d ~/kraken2_db/ \
-i W1.kreport2 \
-o W1.bracken \
-r 100 \
-l S \
-t 54 Merging Bracken outputs
To combine Bracken results across samples, first copy the additional Bracken output files provided for the workshop.
https://cgr.liv.ac.uk/454/acdarby/LIFE750/bracken/
Merge all K and W Bracken files into a single table.
combine_bracken_outputs.py --files [KW]*.bracken -o all.brackenThe merged output contains:
name– Taxon name
taxonomy_id– NCBI taxonomy ID
taxonomy_lvl– Taxonomic rank
For each sample, two additional columns are included:
${Sample}.bracken_num– Reads after abundance reestimation
${Sample}.bracken_frac– Relative abundance
5 Extracting abundance columns
To create a table containing only organism names and abundance counts, we first generate a sequence of column indices corresponding to the bracken_num columns.
seq -s , 4 2 50
bracken_num_columns=$(seq -s , 4 2 50)
echo $bracken_num_columnsExtract the first column and all bracken_num columns.
cut -f 1,$bracken_num_columns all.bracken > all_num.brackenthen less this file to see the contents