Runner Overview
Use the pipeline runner when you want a full disk-backed PanelMMM run instead
of only an in-memory fit.
The runner loads a YAML config and a CSV dataset, builds the model, executes a fixed stage sequence, and writes each stage’s artefacts into a structured run directory. When validation is enabled, the runner performs a second train-window fit for the blocked holdout stage, so the run takes longer than a pure full-sample fit.
If you want a quick first run, start with Quickstart: Pipeline Runner.
Public entry points
The public Python API is:
abacus.pipeline.PipelineRunConfigabacus.pipeline.run_pipelineabacus.pipeline.PipelineRunResult
The thin CLI wraps the same code path:
Basic Python example
PipelineRunResult contains:
| Field | Meaning |
|---|---|
run_dir |
The created run directory |
manifest_path |
The path to run_manifest.json inside that directory |
What the runner does
run_pipeline(...) performs these steps:
- Load the YAML config with
load_yaml_config(...). - Load
Xandyfrom CSV usingload_pipeline_data(...). - Merge CLI sampler overrides with YAML
fitthroughbuild_model_kwargs(...). - Create the output directory tree and initialise
run_manifest.json. - Run the retained stages in order, updating the manifest after every stage.
The model is built in Stage 00 by build_mmm_from_yaml(...), then stored in the
shared PipelineContext for the remaining stages. Runner-only roots such as
diagnostics and validation stay on the pipeline context and are stripped
before the public MMM builder validates the model YAML.
Stage order
The runner uses a fixed stage list.
| Stage key | Directory | Purpose | Optional |
|---|---|---|---|
metadata |
00_run_metadata |
Build the model and write resolved config and dataset metadata | No |
preflight |
10_pre_diagnostics |
Prior predictive draws and plot | No |
fit |
20_model_fit |
Fit the model, save InferenceData, write trace and summary |
No |
assessment |
30_model_assessment |
In-sample posterior predictive checks, fitted values, residual outputs | No |
validation |
35_holdout_validation |
Blocked holdout scoring on a train-window refit | Yes |
decomposition |
40_decomposition |
Contribution tables and decomposition plots | No |
diagnostics |
50_diagnostics |
Raw input screening, MCMC, predictive, and residual diagnostics | No |
curves |
60_response_curves |
Saturation-only, forward-pass direct contribution, and adstock curve artefacts | No |
optimisation |
70_optimisation |
Budget optimisation artefacts | Yes |
The validation stage is marked skipped when the YAML config does not contain
validation or it is disabled. The optimisation stage is also optional; it
returns None and is marked skipped when the YAML config does not contain an
optimization block.
See Output Directory Schema for the stage folders and artefact layout.
Data and model assumptions
The retained runner is designed around PanelMMM.
- The flow-oriented public YAML is expected to describe a
PanelMMM. - The data loader reads CSV only.
- Later stages call
PanelMMMplotting, summary, diagnostics, and optimisation methods directly.
If you need the exact YAML keys, see YAML Configuration.
PipelineRunConfig
PipelineRunConfig controls runtime settings that sit outside the YAML model
specification.
| Field | Purpose |
|---|---|
config_path |
YAML file to load |
output_dir |
Root directory under which the run directory is created |
run_name |
Optional run-name override; otherwise the config filename stem |
dataset_path |
Optional combined dataset CSV override |
x_path, y_path |
Optional feature and target CSV overrides |
holidays_path |
Optional holiday CSV override |
target_column |
Target column name used during CSV loading |
prior_samples |
Number of prior predictive samples for Stage 10 |
draws, tune, chains, cores, random_seed |
Sampler overrides merged onto YAML fit |
curve_samples, curve_points |
Curve sampling settings for Stage 60 |
Only sampler settings are merged into model construction. Other overrides are used by the runner itself during data loading, holiday resolution, diagnostics reporting, and output setup.
Run directory naming
The runner creates the run directory as:
The timestamp is generated in UTC.
All stage directories are created up front, even if a later stage is skipped or the run aborts.
Failure and skip behaviour
If a stage raises an exception:
- the current stage is marked
failed - the run manifest is marked
failed - all still-pending later stages are marked
not_reached run_pipeline(...)re-raises the exception
If a stage returns None:
- the stage is marked
skipped - the manifest warning records that no configuration was supplied for that optional stage
Reporter hook
run_pipeline(...) accepts an optional reporter that implements the
PipelineReporter protocol.
The reporter can observe:
- pipeline start
- stage start
- stage end
- pipeline end
- pipeline failure
See Extending the Runner for the callback contract.