Script: scripts/mbx_beta_diversity_run.sh
Companion files in this folder:
- 13_beta_diversity.html — same content with copy buttons.
- 13_beta_diversity.pptx — slide deck for the talk.
Step 12 told us how diverse each single sample is. The complementary question — and the one most microbiome papers actually lead with — is how different are the samples from each other? That's beta diversity.
Beta diversity matters because researchers usually care about comparisons: did the treatment group's communities shift away from the control group's communities in a way you wouldn't see by chance? The answer comes in three pieces:
Like Step 12, the algorithm is robust precisely because we report four different distance metrics and two different significance tests — so the finding can't be dismissed as an artifact of metric choice.
It rarefies the feature table to Step 11's depth, computes four distance matrices (Bray-Curtis, Jaccard, weighted UniFrac, unweighted UniFrac), runs PERMANOVA + PERMDISP + pairwise PERMANOVA + Adonis on each, produces PCoA plots, distance heatmaps, and UPGMA dendrograms, and writes the results as both QZVs (for interactive inspection) and XLSXs (for the report).
First, same gate as Step 12: READY_FOR_DIVERSITY=yes or --force.
Step 13 also reads 12_alpha_diversity_results/mbx_alpha_diversity_info.txt
to inherit the rarefied table path (so we don't rarefy twice).
Then for each of four metrics, the script runs
qiime diversity beta (non-phylogenetic) or qiime diversity
beta-phylogenetic (phylogenetic):
1 − 2 Σ min(a_i, b_i) / Σ (a_i + b_i).
Sensitive to abundance differences; the most-reported metric in
microbiome papers.1 − |A ∩ B| / |A ∪ B|. Presence/absence only;
asks "do these two samples share the same taxa, ignoring how much?"The four metrics span (abundance vs presence) × (non-phylogenetic vs phylogenetic). A finding consistent across all four is essentially unambiguous.
For every categorical metadata variable, on each of the four distance matrices, the script runs:
qiime diversity beta-group-significance) —
permutational multivariate ANOVA. Tests whether the centroid of
each group's samples differs significantly in the distance space.
Returns an F statistic, an R², and a p-value computed by permuting
group labels 999 times.Then the script runs qiime diversity adonis once per distance
matrix, modelling the metadata as a formula. The user gets:
This catches the common case where a confounder (e.g. sex, age, batch) drives apparent treatment-group differences.
For visualisation, the script runs qiime diversity pcoa on each
distance matrix and exports both the QIIME2 Emperor 3-D interactive
viewer (emperor_<Metric>.qzv) and a 2-D PNG/SVG with samples
colour-coded by treatment group.
It also produces:
mbx_beta_diversity_info.txtFinally the cross-step contract records every artifact + result file,
the rarefaction depth used, the four-metric × N-variable combinations
that ran, and STATUS=COMPLETE.
| Default | Value | Why this default |
|---|---|---|
| Distance metrics | Bray-Curtis, Jaccard, weighted UniFrac, unweighted UniFrac | The four that span (abundance vs presence) × (non-phylogenetic vs phylogenetic). |
| Rarefaction depth | from Step 11 | Single source of truth; inherited via Step 12's info file. |
| PERMANOVA permutations | 999 | QIIME2 default; gives a stable p-value with reasonable speed. |
| PERMDISP permutations | 999 | Same. |
| PERMANOVA test | --p-pairwise when > 2 groups |
We always want the pairwise breakdown when it's meaningful. |
| Adonis SS type | Type III | Order-independent — robust to how the user listed variables in the metadata. |
| PCoA dimensions | top 3 components | Enough for the 2-D PNG + Emperor's 3-D view. |
| Plot formats | PNG + SVG always; PDF on --publication-figures |
Publication-ready by default. |
| Seed | MBX_SEED |
Permutation tests are reproducible only with a fixed seed. |
| Threads | MBX_THREADS |
Single source of truth. |
| Fallback | When it triggers | Why this fallback exists |
|---|---|---|
| Refuse to run | READY_FOR_DIVERSITY=no from Step 11 |
Same gate as Step 12. |
| Skip UniFrac metrics | Tree missing | Bray-Curtis and Jaccard still run; UniFrac logged as skipped. |
| Skip a comparison | Variable has singleton groups after NA filter | PERMANOVA needs ≥ 2 obs per group. |
| Flag PERMDISP-significant cells in the report | PERMDISP p < 0.05 (different variances) | PERMANOVA's null is sensitive to dispersion; the report says "the significant PERMANOVA may reflect dispersion, not centroid difference". |
| Re-use distance matrices | A previous run produced them | Caching saves wall-clock on re-runs. |
PERMANOVA_results_<Variable>.xlsx:
| metric | n_groups | F | R² | p_value | p_adj_BH | significant |
|---|---|---|---|---|---|---|
| Bray-Curtis | 3 | 4.81 | 0.174 | 0.001 | 0.001 | TRUE |
| Jaccard | 3 | 2.92 | 0.118 | 0.014 | 0.018 | TRUE |
| weighted UniFrac | 3 | 5.66 | 0.197 | 0.001 | 0.001 | TRUE |
| unweighted UniFrac | 3 | 3.41 | 0.131 | 0.005 | 0.007 | TRUE |
A finding shared across all four metrics — and supported by Adonis after adjusting for confounders — is essentially unambiguous.
Step 13 answers the comparative question (how different are these samples from each other?) four times — abundance vs presence, phylogenetic vs not — and tests significance two different ways (PERMANOVA for centroid; PERMDISP for dispersion). Adonis with Type-III SS rules out confounders. PCoA + heatmaps + dendrograms make the finding visual. A signal that survives all four metrics and Adonis is publication-grade.
mbXPro/scripts/mbx_beta_diversity_run.shq2-diversity plugin:
https://docs.qiime2.org/2025.4/plugins/available/diversity/