Runner Overview

Use the pipeline runner when you want a full disk-backed PanelMMM run instead of only an in-memory fit.

The runner loads a YAML config and a CSV dataset, builds the model, executes a fixed stage sequence, and writes each stage’s artefacts into a structured run directory. When validation is enabled, the runner performs a second train-window fit for the blocked holdout stage, so the run takes longer than a pure full-sample fit.

If you want a quick first run, start with Quickstart: Pipeline Runner.

Public entry points

The public Python API is:

abacus.pipeline.PipelineRunConfig
abacus.pipeline.run_pipeline
abacus.pipeline.PipelineRunResult

The thin CLI wraps the same code path:

python -m abacus.pipeline.runner --config path/to/config.yml

Basic Python example

from pathlib import Path

from abacus.pipeline import PipelineRunConfig, run_pipeline

result = run_pipeline(
    PipelineRunConfig(
        config_path=Path("data/demo/geo_panel/config.yml"),
        output_dir=Path("results"),
        run_name="geo_panel_baseline",
        prior_samples=10,
        draws=500,
        tune=500,
        chains=2,
        cores=2,
        random_seed=42,
        curve_samples=100,
        curve_points=100,
    )
)

print(result.run_dir)
print(result.manifest_path)

PipelineRunResult contains:

Field	Meaning
`run_dir`	The created run directory
`manifest_path`	The path to `run_manifest.json` inside that directory

What the runner does

run_pipeline(...) performs these steps:

Load the YAML config with load_yaml_config(...).
Load X and y from CSV using load_pipeline_data(...).
Merge CLI sampler overrides with YAML fit through build_model_kwargs(...).
Create the output directory tree and initialise run_manifest.json.
Run the retained stages in order, updating the manifest after every stage.

The model is built in Stage 00 by build_mmm_from_yaml(...), then stored in the shared PipelineContext for the remaining stages. Runner-only roots such as diagnostics and validation stay on the pipeline context and are stripped before the public MMM builder validates the model YAML.

Stage order

The runner uses a fixed stage list.

Stage key	Directory	Purpose	Optional
`metadata`	`00_run_metadata`	Build the model and write resolved config and dataset metadata	No
`preflight`	`10_pre_diagnostics`	Prior predictive draws and plot	No
`fit`	`20_model_fit`	Fit the model, save `InferenceData`, write trace and summary	No
`assessment`	`30_model_assessment`	In-sample posterior predictive checks, fitted values, residual outputs	No
`validation`	`35_holdout_validation`	Blocked holdout scoring on a train-window refit	Yes
`decomposition`	`40_decomposition`	Contribution tables and decomposition plots	No
`diagnostics`	`50_diagnostics`	Raw input screening, MCMC, predictive, and residual diagnostics	No
`curves`	`60_response_curves`	Saturation-only, forward-pass direct contribution, and adstock curve artefacts	No
`optimisation`	`70_optimisation`	Budget optimisation artefacts	Yes

The validation stage is marked skipped when the YAML config does not contain validation or it is disabled. The optimisation stage is also optional; it returns None and is marked skipped when the YAML config does not contain an optimization block.

See Output Directory Schema for the stage folders and artefact layout.

Data and model assumptions

The retained runner is designed around PanelMMM.

The flow-oriented public YAML is expected to describe a PanelMMM.
The data loader reads CSV only.
Later stages call PanelMMM plotting, summary, diagnostics, and optimisation methods directly.

If you need the exact YAML keys, see YAML Configuration.

`PipelineRunConfig`

PipelineRunConfig controls runtime settings that sit outside the YAML model specification.

Field	Purpose
`config_path`	YAML file to load
`output_dir`	Root directory under which the run directory is created
`run_name`	Optional run-name override; otherwise the config filename stem
`dataset_path`	Optional combined dataset CSV override
`x_path`, `y_path`	Optional feature and target CSV overrides
`holidays_path`	Optional holiday CSV override
`target_column`	Target column name used during CSV loading
`prior_samples`	Number of prior predictive samples for Stage 10
`draws`, `tune`, `chains`, `cores`, `random_seed`	Sampler overrides merged onto YAML `fit`
`curve_samples`, `curve_points`	Curve sampling settings for Stage 60

Only sampler settings are merged into model construction. Other overrides are used by the runner itself during data loading, holiday resolution, diagnostics reporting, and output setup.

Run directory naming

The runner creates the run directory as:

<output_dir>/<effective_run_name>_<YYYYMMDD_HHMMSS>

The timestamp is generated in UTC.

All stage directories are created up front, even if a later stage is skipped or the run aborts.

Failure and skip behaviour

If a stage raises an exception:

the current stage is marked failed
the run manifest is marked failed
all still-pending later stages are marked not_reached
run_pipeline(...) re-raises the exception

If a stage returns None:

the stage is marked skipped
the manifest warning records that no configuration was supplied for that optional stage

Reporter hook

run_pipeline(...) accepts an optional reporter that implements the PipelineReporter protocol.

The reporter can observe:

pipeline start
stage start
stage end
pipeline end
pipeline failure

See Extending the Runner for the callback contract.