Runner Overview

Use the pipeline runner when you want a full disk-backed PanelMMM run instead of only an in-memory fit.

The runner loads a YAML config and a CSV dataset, builds the model, executes a fixed stage sequence, and writes each stage’s artefacts into a structured run directory. When validation is enabled, the runner performs a second train-window fit for the blocked holdout stage, so the run takes longer than a pure full-sample fit.

If you want a quick first run, start with Quickstart: Pipeline Runner.

Public entry points

The public Python API is:

  • abacus.pipeline.PipelineRunConfig
  • abacus.pipeline.run_pipeline
  • abacus.pipeline.PipelineRunResult

The thin CLI wraps the same code path:

python -m abacus.pipeline.runner --config path/to/config.yml

Basic Python example

from pathlib import Path

from abacus.pipeline import PipelineRunConfig, run_pipeline

result = run_pipeline(
    PipelineRunConfig(
        config_path=Path("data/demo/geo_panel/config.yml"),
        output_dir=Path("results"),
        run_name="geo_panel_baseline",
        prior_samples=10,
        draws=500,
        tune=500,
        chains=2,
        cores=2,
        random_seed=42,
        curve_samples=100,
        curve_points=100,
    )
)

print(result.run_dir)
print(result.manifest_path)

PipelineRunResult contains:

Field Meaning
run_dir The created run directory
manifest_path The path to run_manifest.json inside that directory

What the runner does

run_pipeline(...) performs these steps:

  1. Load the YAML config with load_yaml_config(...).
  2. Load X and y from CSV using load_pipeline_data(...).
  3. Merge CLI sampler overrides with YAML fit through build_model_kwargs(...).
  4. Create the output directory tree and initialise run_manifest.json.
  5. Run the retained stages in order, updating the manifest after every stage.

The model is built in Stage 00 by build_mmm_from_yaml(...), then stored in the shared PipelineContext for the remaining stages. Runner-only roots such as diagnostics and validation stay on the pipeline context and are stripped before the public MMM builder validates the model YAML.

Stage order

The runner uses a fixed stage list.

Stage key Directory Purpose Optional
metadata 00_run_metadata Build the model and write resolved config and dataset metadata No
preflight 10_pre_diagnostics Prior predictive draws and plot No
fit 20_model_fit Fit the model, save InferenceData, write trace and summary No
assessment 30_model_assessment In-sample posterior predictive checks, fitted values, residual outputs No
validation 35_holdout_validation Blocked holdout scoring on a train-window refit Yes
decomposition 40_decomposition Contribution tables and decomposition plots No
diagnostics 50_diagnostics Raw input screening, MCMC, predictive, and residual diagnostics No
curves 60_response_curves Saturation-only, forward-pass direct contribution, and adstock curve artefacts No
optimisation 70_optimisation Budget optimisation artefacts Yes

The validation stage is marked skipped when the YAML config does not contain validation or it is disabled. The optimisation stage is also optional; it returns None and is marked skipped when the YAML config does not contain an optimization block.

See Output Directory Schema for the stage folders and artefact layout.

Data and model assumptions

The retained runner is designed around PanelMMM.

  • The flow-oriented public YAML is expected to describe a PanelMMM.
  • The data loader reads CSV only.
  • Later stages call PanelMMM plotting, summary, diagnostics, and optimisation methods directly.

If you need the exact YAML keys, see YAML Configuration.

PipelineRunConfig

PipelineRunConfig controls runtime settings that sit outside the YAML model specification.

Field Purpose
config_path YAML file to load
output_dir Root directory under which the run directory is created
run_name Optional run-name override; otherwise the config filename stem
dataset_path Optional combined dataset CSV override
x_path, y_path Optional feature and target CSV overrides
holidays_path Optional holiday CSV override
target_column Target column name used during CSV loading
prior_samples Number of prior predictive samples for Stage 10
draws, tune, chains, cores, random_seed Sampler overrides merged onto YAML fit
curve_samples, curve_points Curve sampling settings for Stage 60

Only sampler settings are merged into model construction. Other overrides are used by the runner itself during data loading, holiday resolution, diagnostics reporting, and output setup.

Run directory naming

The runner creates the run directory as:

<output_dir>/<effective_run_name>_<YYYYMMDD_HHMMSS>

The timestamp is generated in UTC.

All stage directories are created up front, even if a later stage is skipped or the run aborts.

Failure and skip behaviour

If a stage raises an exception:

  • the current stage is marked failed
  • the run manifest is marked failed
  • all still-pending later stages are marked not_reached
  • run_pipeline(...) re-raises the exception

If a stage returns None:

  • the stage is marked skipped
  • the manifest warning records that no configuration was supplied for that optional stage

Reporter hook

run_pipeline(...) accepts an optional reporter that implements the PipelineReporter protocol.

The reporter can observe:

  • pipeline start
  • stage start
  • stage end
  • pipeline end
  • pipeline failure

See Extending the Runner for the callback contract.