Script: scripts/mbx_final_report.sh
Companion files in this folder:
- 18_final_report.html — same content with copy buttons.
- 18_final_report.pptx — slide deck for the talk.
Seventeen steps have produced QZA artifacts, XLSX tables, PNG/SVG/PDF
figures, and per-step *_info.txt provenance files. Each is useful on
its own. None of them is what a user actually wants to send to a
collaborator.
That artefact is a single document — HTML for browsing, PDF for archiving — that:
local-training-
fallback at Step 6, REVIEW_REQUIRED at Step 11, STATUS=PARTIAL
anywhere — so reviewers can audit the run.--steps skipped Step 15
or Step 17, the report renders the available sections instead of
crashing.Step 18 produces that document.
It discovers which of the 18 steps actually ran in this mbX_pro_outputs_*/
directory, parses every *_info.txt to learn what each step decided,
builds the HTML report section-by-section in R using htmltools, wraps
each section in safe_section() so a single broken renderer can't kill
the document, and optionally produces a PDF via chrome headless or
weasyprint.
First the script walks the mbX_pro_outputs_*/ directory and
identifies which of the 18 expected step outputs are present. It
categorises each into one of three buckets based on its *_info.txt:
DONE — STATUS=COMPLETE line is present (or a legacy info file
with no STATUS line at all).PARTIAL — STATUS=PARTIAL line is present (the step ran but some
inner work failed gracefully — e.g. Step 8 with species level empty).MISSING — no info file or no expected sentinel file present (the
step didn't run).The result is discovery.json — the source of truth for which sections
to render.
If STEP_MAX_DONE < 0 (nothing ran at all), the script exits cleanly
with a message. There's nothing to report on. Every other state — even
"only Step 0 ran" — produces a document.
Then the script ensures htmltools, base64enc, openxlsx, and
the other R packages the report needs are installed. The unified
scripts/lib/install_r_deps.R does this once for the whole pipeline; the
report just checks.
Now R takes over. The first block builds the cover header — logo + title + subtitle + run timestamp + per-step status banner showing which steps ran, which were partial, which were missing.
safe_section()For each step that has data, build that step's section. Sections are modular:
section_step0 <- function() { ... } # primer identification
section_step12 <- function() { ... } # manifest + artifact
section_step34 <- function() { ... } # DADA2 parameters + denoising
section_step56 <- function() { ... } # classifier arrange + run
section_step7 <- function() { ... } # taxonomy + mito/chloro
section_step8 <- function() { ... } # ezclean
section_step9 <- function() { ... } # ezviz
section_step10 <- function() { ... } # ezstat
section_step11 <- function() { ... } # pre-diversity verdict
section_step12 <- function() { ... } # alpha diversity
section_step13 <- function() { ... } # beta diversity
section_step14 <- function() { ... } # ANCOMBC2
section_step15 <- function() { ... } # PICRUSt2
section_step16 <- function() { ... } # Random Forest
section_step17 <- function() { ... } # Networks
section_convergence(...) # what shows up in ANCOMBC2 AND ML AND networks
section_caveats(...) # NSTI flagging, PERMDISP traps, mito/chloro %
section_methods() # the full methods text, citable
section_reproducibility() # versions, seeds, threads, run_manifest.json link
Every section is wrapped in safe_section() (the wrapper we added
in mbX Pro 1.4.0). If one section's renderer crashes — e.g. an XLSX
file is corrupt — safe_section() substitutes an explanatory box and
the report keeps going. Without that wrapper, a single broken XLSX would
take the whole report down.
Every PNG referenced by the report is base64-encoded into a
<img src="data:image/png;base64,..."> tag so the resulting HTML is
a single file. The user can email the file; the recipient sees the
plots without needing the directory tree.
The most useful synthesis, after every per-step section, is the
section_convergence() block. It cross-references the per-step results
to flag the taxa that appear significant in multiple ways:
A taxon that hits all three is the closest thing to "robust biomarker" the pipeline can produce. The convergence section lists them with inline log-fold-changes, importance scores, and centrality metrics.
Then R writes the assembled document to mbXPro_final_report.html.
Single file, self-contained, ~5–20 MB depending on figure count.
If the user did not pass --no-pdf, the script tries to render the
HTML to PDF using one of three engines, in order:
chrome --headless (most reliable for complex CSS, slow start).weasyprint (lightweight, works offline if installed).wkhtmltopdf (legacy fallback).The PDF is for archiving; the HTML stays canonical (the PDF is necessarily a flattened representation).
mbx_final_report_info.txtFinally the cross-step contract records the HTML + PDF paths, every
step's verdict + sub-status, and STATUS=COMPLETE. The orchestrator
runs scripts/lib/build_run_manifest.py right after this step to produce
the per-run run_manifest.json for publication supplementary materials.
| Default | Value | Why this default |
|---|---|---|
| HTML format | self-contained (figures inlined as base64) | Single-file shareability is the main UX win over per-step artifacts. |
| Default PDF engine | chrome --headless |
Best CSS support of the three. |
| PDF engine fallbacks | weasyprint → wkhtmltopdf |
Tried in order; first available wins. |
| Section wrapper | safe_section() |
Mandatory (mbX Pro 1.4.0+); never let a single broken section kill the report. |
| Convergence threshold | significant in ANCOMBC2 AND in top-20 RF importance AND a network hub | Conservative — taxa that meet all three are robust. |
| Embed figure resolution | PNG (already 300 DPI from the source steps) | We use the existing PNGs; no re-rendering. |
| Methods text | citable, every tool with version + citation | The reviewer can copy the methods straight into a manuscript. |
mbXPro_final_report_info.txt |
always written | Final cross-step contract; run_manifest.json depends on it. |
| Fallback | When it triggers | Why this fallback exists |
|---|---|---|
| Skip section | A step's data isn't present (it didn't run) | The corresponding section gets a "this step was skipped" box; the rest of the report keeps building. |
safe_section() substitution |
A section's renderer throws | The whole report would otherwise abort. The user sees an explanation; everything else still renders. |
--no-pdf mode |
User opts out | PDF rendering can be slow on large reports; HTML is always produced. |
| Engine fallback | Chrome not installed | Try weasyprint, then wkhtmltopdf. |
local-training-fallback flagged |
Step 6 used the Zenodo-fallback path | The report banner says so. Reviewers can audit which classifier was actually used. |
REVIEW_REQUIRED flagged |
Step 11's depth verdict was weak | Diversity sections are still rendered but with a warning banner. |
STATUS=PARTIAL flagged |
Any step finished some but not all sub-work | Section header carries the partial badge; the missing sub-results are noted. |
18_final_report/mbXPro_final_report.html — a single self-contained
document with this structure:
COVER
├── Logo + title + run timestamp
├── Per-step status banner: 0 ✓ · 1 ✓ · 2 ✓ · … · 14 ✓ · 15 ⚠ · 16 ✓ · 17 ✓
└── Reproducibility quick-reference (versions, seed, threads)
INTRO
├── What the pipeline does in two paragraphs
└── How to read the report
PER-STEP SECTIONS (one per step that ran)
├── Step 0 — primer verdict + confidence + plot
├── Step 1+2 — manifest + artifact
├── Step 3+4 — DADA2 params + per-sample retention
├── Step 5+6 — classifier mode + source + provenance
├── Step 7 — mito/chloro removal rate + barplot embed
├── Step 8 — ezclean per-level table sizes
├── Step 9 — embedded stacked-bar plots, one per (level × variable)
├── Step 10 — KW + CLD per-taxon results
├── Step 11 — depth verdict + rarefaction curve + 3-criterion table
├── Step 12 — alpha boxplots + KW results
├── Step 13 — PCoA + PERMANOVA tables
├── Step 14 — ANCOMBC2 volcano + heatmap + top hits
├── Step 15 — PICRUSt2 functional report excerpt + NSTI flagging
├── Step 16 — RF accuracy/AUC summary + ROC + importance
└── Step 17 — networks + hub taxa
CONVERGENCE (taxa significant in ≥ 3 of: ANCOMBC2, RF, network hub)
CAVEATS (NSTI flagging, PERMDISP traps, mito/chloro %, retention)
METHODS (citable text with every tool + version + reference)
REPRODUCIBILITY (versions, seeds, threads, link to run_manifest.json)
Plus the PDF version when --no-pdf wasn't passed.
Step 18 turns seventeen scattered artifact directories into one shareable document. The
safe_section()wrapper (added in mbX Pro 1.4.0) means a single broken renderer can't kill the report. Figures are base64-embedded so the HTML is self-contained. The convergence section is where the pipeline's claim of "robust biomarker" finally gets evidenced — a taxon flagged by ANCOMBC2, important to the Random Forest, AND a network hub is hard to dismiss.
mbXPro/scripts/mbx_final_report.shhtmltools: Cheng et al., R package on CRAN.safe_section() wrapper: mbX Pro 1.4.0, see CHANGELOG.md.run_manifest.json builder (which Step 18 hands off to): scripts/lib/build_run_manifest.py.