Contents

Why this step matters
What the script does in one sentence
The algorithm, step by step
Default parameters and why they are what they are
When and why we fall back to defaults
What the output file looks like
Takeaway
Sources

Step 13 — Beta Diversity

Script: scripts/mbx_beta_diversity_run.sh

Companion files in this folder: - 13_beta_diversity.html — same content with copy buttons. - 13_beta_diversity.pptx — slide deck for the talk.

Why this step matters

Step 12 told us how diverse each single sample is. The complementary question — and the one most microbiome papers actually lead with — is how different are the samples from each other? That's beta diversity.

Beta diversity matters because researchers usually care about comparisons: did the treatment group's communities shift away from the control group's communities in a way you wouldn't see by chance? The answer comes in three pieces:

A distance matrix — for every pair of samples, a number that says how different their communities are.
An ordination — a 2-D or 3-D projection (PCoA) of that distance matrix so a human can see the clustering.
A statistical test — does the between-group dissimilarity exceed the within-group dissimilarity by more than chance allows?

Like Step 12, the algorithm is robust precisely because we report four different distance metrics and two different significance tests — so the finding can't be dismissed as an artifact of metric choice.

What the script does in one sentence

It rarefies the feature table to Step 11's depth, computes four distance matrices (Bray-Curtis, Jaccard, weighted UniFrac, unweighted UniFrac), runs PERMANOVA + PERMDISP + pairwise PERMANOVA + Adonis on each, produces PCoA plots, distance heatmaps, and UPGMA dendrograms, and writes the results as both QZVs (for interactive inspection) and XLSXs (for the report).

The algorithm, step by step

1. Gate on Step 11

First, same gate as Step 12: READY_FOR_DIVERSITY=yes or --force. Step 13 also reads 12_alpha_diversity_results/mbx_alpha_diversity_info.txt to inherit the rarefied table path (so we don't rarefy twice).

2. Compute the four distance matrices

Then for each of four metrics, the script runs qiime diversity beta (non-phylogenetic) or qiime diversity beta-phylogenetic (phylogenetic):

Bray-Curtis dissimilarity — 1 − 2 Σ min(a_i, b_i) / Σ (a_i + b_i). Sensitive to abundance differences; the most-reported metric in microbiome papers.
Jaccard distance — 1 − |A ∩ B| / |A ∪ B|. Presence/absence only; asks "do these two samples share the same taxa, ignoring how much?"
Weighted UniFrac — sums the branch lengths of the phylogenetic tree weighted by abundance differences between the two samples. Phylogeny + abundance.
Unweighted UniFrac — same but binary (presence/absence) — pure phylogenetic dissimilarity.

The four metrics span (abundance vs presence) × (non-phylogenetic vs phylogenetic). A finding consistent across all four is essentially unambiguous.

3. For every categorical variable: PERMANOVA + PERMDISP

For every categorical metadata variable, on each of the four distance matrices, the script runs:

PERMANOVA (qiime diversity beta-group-significance) — permutational multivariate ANOVA. Tests whether the centroid of each group's samples differs significantly in the distance space. Returns an F statistic, an R², and a p-value computed by permuting group labels 999 times.
PERMDISP — tests whether the dispersion of each group differs. This matters because PERMANOVA can spuriously reject the null when groups have different variances (one tight cluster vs one diffuse cloud — PERMANOVA sees the centroids as different even when they aren't). PERMDISP catches this artifact.
Pairwise PERMANOVA when the variable has > 2 groups — tells the user which group pairs differ.

4. Adonis (multivariable, with covariates)

Then the script runs qiime diversity adonis once per distance matrix, modelling the metadata as a formula. The user gets:

Type-III sums of squares so the order of variables doesn't change the test.
Univariate Adonis rows per variable (just that variable vs distance).
Multivariate Adonis with all variables together — the "after controlling for everything else" version.

This catches the common case where a confounder (e.g. sex, age, batch) drives apparent treatment-group differences.

5. PCoAs + distance heatmaps + UPGMA dendrograms

For visualisation, the script runs qiime diversity pcoa on each distance matrix and exports both the QIIME2 Emperor 3-D interactive viewer (emperor_<Metric>.qzv) and a 2-D PNG/SVG with samples colour-coded by treatment group.

It also produces:

Distance heatmap per metric — the full pairwise distance matrix shown as a heatmap with samples grouped by treatment.
UPGMA dendrogram per metric — hierarchical-clustering view that shows how samples cluster from the distance matrix.
Boxplot of distance-to-centroid per metric per variable — the visual version of the PERMDISP test.

6. Write `mbx_beta_diversity_info.txt`

Finally the cross-step contract records every artifact + result file, the rarefaction depth used, the four-metric × N-variable combinations that ran, and STATUS=COMPLETE.

Default parameters and why they are what they are

Default	Value	Why this default
Distance metrics	Bray-Curtis, Jaccard, weighted UniFrac, unweighted UniFrac	The four that span (abundance vs presence) × (non-phylogenetic vs phylogenetic).
Rarefaction depth	from Step 11	Single source of truth; inherited via Step 12's info file.
PERMANOVA permutations	999	QIIME2 default; gives a stable p-value with reasonable speed.
PERMDISP permutations	999	Same.
PERMANOVA test	`--p-pairwise` when > 2 groups	We always want the pairwise breakdown when it's meaningful.
Adonis SS type	Type III	Order-independent — robust to how the user listed variables in the metadata.
PCoA dimensions	top 3 components	Enough for the 2-D PNG + Emperor's 3-D view.
Plot formats	PNG + SVG always; PDF on `--publication-figures`	Publication-ready by default.
Seed	`MBX_SEED`	Permutation tests are reproducible only with a fixed seed.
Threads	`MBX_THREADS`	Single source of truth.

When and why we fall back to defaults

Fallback	When it triggers	Why this fallback exists
Refuse to run	`READY_FOR_DIVERSITY=no` from Step 11	Same gate as Step 12.
Skip UniFrac metrics	Tree missing	Bray-Curtis and Jaccard still run; UniFrac logged as skipped.
Skip a comparison	Variable has singleton groups after NA filter	PERMANOVA needs ≥ 2 obs per group.
Flag PERMDISP-significant cells in the report	PERMDISP p < 0.05 (different variances)	PERMANOVA's null is sensitive to dispersion; the report says "the significant PERMANOVA may reflect dispersion, not centroid difference".
Re-use distance matrices	A previous run produced them	Caching saves wall-clock on re-runs.

What the output file looks like

PERMANOVA_results_<Variable>.xlsx:

metric	n_groups	F	R²	p_value	p_adj_BH	significant
Bray-Curtis	3	4.81	0.174	0.001	0.001	TRUE
Jaccard	3	2.92	0.118	0.014	0.018	TRUE
weighted UniFrac	3	5.66	0.197	0.001	0.001	TRUE
unweighted UniFrac	3	3.41	0.131	0.005	0.007	TRUE

A finding shared across all four metrics — and supported by Adonis after adjusting for confounders — is essentially unambiguous.

Takeaway

Step 13 answers the comparative question (how different are these samples from each other?) four times — abundance vs presence, phylogenetic vs not — and tests significance two different ways (PERMANOVA for centroid; PERMDISP for dispersion). Adonis with Type-III SS rules out confounders. PCoA + heatmaps + dendrograms make the finding visual. A signal that survives all four metrics and Adonis is publication-grade.

Sources

The script: mbXPro/scripts/mbx_beta_diversity_run.sh
PERMANOVA: Anderson (2001), A new method for non-parametric multivariate analysis of variance, Austral Ecology 26:32–46.
PERMDISP: Anderson (2006), Distance-based tests for homogeneity of multivariate dispersions, Biometrics 62:245–253.
Bray-Curtis: Bray & Curtis (1957), An ordination of upland forest communities of southern Wisconsin, Ecol Monogr 27:325.
UniFrac: Lozupone & Knight (2005), UniFrac: a new phylogenetic method for comparing microbial communities, AEM 71:8228–8235.
QIIME2 q2-diversity plugin: https://docs.qiime2.org/2025.4/plugins/available/diversity/