Contents
  1. Why this step matters
  2. What the script does in one sentence
  3. The algorithm, step by step
  4. Default parameters and why they are what they are
  5. When and why we fall back to defaults
  6. What the output file looks like
  7. Takeaway
  8. Sources

Step 13 — Beta Diversity

Script: scripts/mbx_beta_diversity_run.sh

Companion files in this folder: - 13_beta_diversity.html — same content with copy buttons. - 13_beta_diversity.pptx — slide deck for the talk.


Why this step matters

Step 12 told us how diverse each single sample is. The complementary question — and the one most microbiome papers actually lead with — is how different are the samples from each other? That's beta diversity.

Beta diversity matters because researchers usually care about comparisons: did the treatment group's communities shift away from the control group's communities in a way you wouldn't see by chance? The answer comes in three pieces:

  1. A distance matrix — for every pair of samples, a number that says how different their communities are.
  2. An ordination — a 2-D or 3-D projection (PCoA) of that distance matrix so a human can see the clustering.
  3. A statistical test — does the between-group dissimilarity exceed the within-group dissimilarity by more than chance allows?

Like Step 12, the algorithm is robust precisely because we report four different distance metrics and two different significance tests — so the finding can't be dismissed as an artifact of metric choice.


What the script does in one sentence

It rarefies the feature table to Step 11's depth, computes four distance matrices (Bray-Curtis, Jaccard, weighted UniFrac, unweighted UniFrac), runs PERMANOVA + PERMDISP + pairwise PERMANOVA + Adonis on each, produces PCoA plots, distance heatmaps, and UPGMA dendrograms, and writes the results as both QZVs (for interactive inspection) and XLSXs (for the report).


The algorithm, step by step

1. Gate on Step 11

First, same gate as Step 12: READY_FOR_DIVERSITY=yes or --force. Step 13 also reads 12_alpha_diversity_results/mbx_alpha_diversity_info.txt to inherit the rarefied table path (so we don't rarefy twice).

2. Compute the four distance matrices

Then for each of four metrics, the script runs qiime diversity beta (non-phylogenetic) or qiime diversity beta-phylogenetic (phylogenetic):

The four metrics span (abundance vs presence) × (non-phylogenetic vs phylogenetic). A finding consistent across all four is essentially unambiguous.

3. For every categorical variable: PERMANOVA + PERMDISP

For every categorical metadata variable, on each of the four distance matrices, the script runs:

4. Adonis (multivariable, with covariates)

Then the script runs qiime diversity adonis once per distance matrix, modelling the metadata as a formula. The user gets:

This catches the common case where a confounder (e.g. sex, age, batch) drives apparent treatment-group differences.

5. PCoAs + distance heatmaps + UPGMA dendrograms

For visualisation, the script runs qiime diversity pcoa on each distance matrix and exports both the QIIME2 Emperor 3-D interactive viewer (emperor_<Metric>.qzv) and a 2-D PNG/SVG with samples colour-coded by treatment group.

It also produces:

6. Write mbx_beta_diversity_info.txt

Finally the cross-step contract records every artifact + result file, the rarefaction depth used, the four-metric × N-variable combinations that ran, and STATUS=COMPLETE.


Default parameters and why they are what they are

Default Value Why this default
Distance metrics Bray-Curtis, Jaccard, weighted UniFrac, unweighted UniFrac The four that span (abundance vs presence) × (non-phylogenetic vs phylogenetic).
Rarefaction depth from Step 11 Single source of truth; inherited via Step 12's info file.
PERMANOVA permutations 999 QIIME2 default; gives a stable p-value with reasonable speed.
PERMDISP permutations 999 Same.
PERMANOVA test --p-pairwise when > 2 groups We always want the pairwise breakdown when it's meaningful.
Adonis SS type Type III Order-independent — robust to how the user listed variables in the metadata.
PCoA dimensions top 3 components Enough for the 2-D PNG + Emperor's 3-D view.
Plot formats PNG + SVG always; PDF on --publication-figures Publication-ready by default.
Seed MBX_SEED Permutation tests are reproducible only with a fixed seed.
Threads MBX_THREADS Single source of truth.

When and why we fall back to defaults

Fallback When it triggers Why this fallback exists
Refuse to run READY_FOR_DIVERSITY=no from Step 11 Same gate as Step 12.
Skip UniFrac metrics Tree missing Bray-Curtis and Jaccard still run; UniFrac logged as skipped.
Skip a comparison Variable has singleton groups after NA filter PERMANOVA needs ≥ 2 obs per group.
Flag PERMDISP-significant cells in the report PERMDISP p < 0.05 (different variances) PERMANOVA's null is sensitive to dispersion; the report says "the significant PERMANOVA may reflect dispersion, not centroid difference".
Re-use distance matrices A previous run produced them Caching saves wall-clock on re-runs.

What the output file looks like

PERMANOVA_results_<Variable>.xlsx:

metric n_groups F p_value p_adj_BH significant
Bray-Curtis 3 4.81 0.174 0.001 0.001 TRUE
Jaccard 3 2.92 0.118 0.014 0.018 TRUE
weighted UniFrac 3 5.66 0.197 0.001 0.001 TRUE
unweighted UniFrac 3 3.41 0.131 0.005 0.007 TRUE

A finding shared across all four metrics — and supported by Adonis after adjusting for confounders — is essentially unambiguous.


Takeaway

Step 13 answers the comparative question (how different are these samples from each other?) four times — abundance vs presence, phylogenetic vs not — and tests significance two different ways (PERMANOVA for centroid; PERMDISP for dispersion). Adonis with Type-III SS rules out confounders. PCoA + heatmaps + dendrograms make the finding visual. A signal that survives all four metrics and Adonis is publication-grade.


Sources