Pipeline Runner — Abacus Documentation

Runner Overview

Mon, 01 Jan 0001 00:00:00 +0000

Use the pipeline runner when you want a full disk-backed PanelMMM run instead of only an in-memory fit. The runner loads a YAML config and a CSV dataset, builds the model, executes a fixed stage sequence, and writes each stage’s artefacts into a structured run directory. When validation is enabled, the runner performs a second train-window fit for the blocked holdout stage, so the run takes longer than a pure full-sample fit.

YAML Configuration

Mon, 01 Jan 0001 00:00:00 +0000

The pipeline runner reads the same YAML model specification used by build_mmm_from_yaml(...), then adds a small set of runner-specific conventions for data loading, optional blocked holdout validation, and Stage 70 optimisation. This page documents the keys that the runner actually consumes. Root keys Key Required Used for data Usually Resolve dataset paths when you do not pass dataset_path, x_path, or y_path through PipelineRunConfig target Yes Define the target column and business target type dimensions No Declare panel-dimension columns such as geo or brand media Yes Define channel/control columns and transform types scaling No Configure target/channel scaling rules effects No Append additive effects in YAML order before build_model(...) priors No Override model-level priors and prefixed transform priors fit No Default sampler settings for Stage 20 fitting holidays No Add holiday events before model build original_scale_vars No Add original-scale contribution variables before fitting inference_data No Attach existing InferenceData when the file exists validation No Enable optional Stage 35 blocked holdout validation optimization No Enable Stage 70 budget optimisation diagnostics No Override Stage 50 runner diagnostics thresholds Minimal runner config data: dataset_path: dataset.csv date_column: date target: column: revenue type: revenue dimensions: panel: [geo] media: channels: [channel_1, channel_2] adstock: type: geometric l_max: 4 saturation: type: logistic fit: draws: 1000 tune: 1000 chains: 4 cores: 4 random_seed: 42 Relative paths in YAML are resolved relative to the YAML file’s directory.

Output Directory Schema

Mon, 01 Jan 0001 00:00:00 +0000

Each pipeline run creates a timestamped directory under the configured output_dir: /_ The timestamp is generated in UTC. The runner creates every stage directory up front, then updates run_manifest.json as stages start, complete, skip, or fail. Directory tree results/ geo_panel_baseline_20260308_153000/ run_manifest.json 00_run_metadata/ 10_pre_diagnostics/ 20_model_fit/ 30_model_assessment/ 35_holdout_validation/ 40_decomposition/ 50_diagnostics/ 60_response_curves/ 70_optimisation/ Stage directories Stage Directory Typical artefacts metadata 00_run_metadata config.resolved.yaml, model_metadata.json, spec_summary.csv preflight 10_pre_diagnostics prior_predictive.nc, prior_predictive.png fit 20_model_fit model.nc, trace.png, posterior_summary.csv assessment 30_model_assessment in-sample posterior predictive checks and residual outputs validation 35_holdout_validation blocked holdout scoring, uncertainty-aware metrics, and residual diagnostics decomposition 40_decomposition contribution CSVs and decomposition plots diagnostics 50_diagnostics raw input screening, MCMC, predictive, and residual diagnostic reports curves 60_response_curves saturation-only, forward-pass direct contribution, and adstock NetCDF, summaries, and plots optimisation 70_optimisation allocation, response, optimisation summary, and bounds audit artefacts See Runner Overview for the stage order and optionality.

CLI Reference

Mon, 01 Jan 0001 00:00:00 +0000

The pipeline exposes a thin CLI through abacus.pipeline.runner. Entry point python -m abacus.pipeline.runner --config path/to/config.yml On success, the CLI prints the final run directory: Structured pipeline completed: results/my_run_20260308_153000 Arguments Flag Required Default Meaning --config Yes None YAML config path --output-dir No results Root directory for pipeline runs --run-name No Config filename stem Optional run-name override --dataset-path No None Combined dataset CSV override --x-path No None Feature CSV override when not using --dataset-path --y-path No None Target CSV override when not using --dataset-path --holidays-path No None Holiday CSV override --target-column No None Target column used when reading CSV input --prior-samples No 20 Prior predictive samples for Stage 10 --draws No None Posterior draws override --tune No None Posterior tuning steps override --chains No None Posterior chains override --cores No None Posterior cores override --random-seed No 42 Shared random seed --curve-samples No 100 Posterior samples for Stage 60 curves --curve-points No 100 Number of x-values for saturation curves Common command patterns Use the dataset path from YAML python -m abacus.pipeline.runner \ --config data/demo/geo_panel/config.yml Override the combined dataset path python -m abacus.pipeline.runner \ --config configs/geo_panel.yml \ --dataset-path /data/geo_panel_latest.csv \ --run-name geo_panel_latest Use separate feature and target files python -m abacus.pipeline.runner \ --config configs/panel.yml \ --x-path /data/X.csv \ --y-path /data/y.csv \ --target-column revenue Override sampler settings for one run python -m abacus.pipeline.runner \ --config configs/panel.yml \ --draws 1000 \ --tune 1000 \ --chains 4 \ --cores 4 \ --random-seed 42 Override the holiday CSV python -m abacus.pipeline.runner \ --config configs/panel.yml \ --holidays-path /data/holidays_uk_fr.csv How CLI overrides interact with YAML The CLI does not replace the full YAML config. It only overrides the runtime fields exposed through PipelineRunConfig.

Extending the Runner

Mon, 01 Jan 0001 00:00:00 +0000

The retained runner is static, not plugin-based. To add a stage or integrate custom status reporting, extend the existing runner surfaces instead of bypassing them. Stage contract A stage function has this contract: def run_some_stage(context: PipelineContext) -> dict[str, str] | None: ... Return values: return a dict[str, str] of artefact labels to root-relative paths when the stage succeeds return None when the stage is intentionally skipped raise an exception when the stage fails and should abort the run The runner handles manifest updates around the stage call. Do not update context.manifest directly from a normal stage implementation unless you are changing core runner behaviour.