YAML Configuration

The pipeline runner reads the same YAML model specification used by build_mmm_from_yaml(...), then adds a small set of runner-specific conventions for data loading, optional blocked holdout validation, and Stage 70 optimisation.

This page documents the keys that the runner actually consumes.

Root keys

Key	Required	Used for
`data`	Usually	Resolve dataset paths when you do not pass `dataset_path`, `x_path`, or `y_path` through `PipelineRunConfig`
`target`	Yes	Define the target column and business target type
`dimensions`	No	Declare panel-dimension columns such as `geo` or `brand`
`media`	Yes	Define channel/control columns and transform types
`scaling`	No	Configure target/channel scaling rules
`effects`	No	Append additive effects in YAML order before `build_model(...)`
`priors`	No	Override model-level priors and prefixed transform priors
`fit`	No	Default sampler settings for Stage 20 fitting
`holidays`	No	Add holiday events before model build
`original_scale_vars`	No	Add original-scale contribution variables before fitting
`inference_data`	No	Attach existing `InferenceData` when the file exists
`validation`	No	Enable optional Stage 35 blocked holdout validation
`optimization`	No	Enable Stage 70 budget optimisation
`diagnostics`	No	Override Stage 50 runner diagnostics thresholds

Minimal runner config

data:
  dataset_path: dataset.csv
  date_column: date

target:
  column: revenue
  type: revenue

dimensions:
  panel: [geo]

media:
  channels: [channel_1, channel_2]
  adstock:
    type: geometric
    l_max: 4
  saturation:
    type: logistic

fit:
  draws: 1000
  tune: 1000
  chains: 4
  cores: 4
  random_seed: 42

Relative paths in YAML are resolved relative to the YAML file’s directory.

diagnostics is runner-only. The structured pipeline reads it, but build_mmm_from_yaml(...) still validates only the public MMM model schema.

validation is also runner-only. The structured pipeline reads it for Stage 35 blocked holdout scoring, but the public MMM YAML builder never sees it.

Core modeling blocks

The runner always builds a PanelMMM, so the public YAML no longer exposes a model.class field. Instead, it reads:

data.date_column
target.column
target.type
media.channels
media.controls, if any
dimensions.panel, if any
media.adstock
media.saturation
fit

`data`

The runner loads data before building the model. It supports two CSV layouts.

Combined dataset

data:
  dataset_path: "dataset.csv"

The runner reads the CSV, removes the target column from X, and uses that column as y.

Separate feature and target files

data:
  x_path: "X.csv"
  y_path: "y.csv"

When loading y_path:

if the configured target column exists, the runner uses that column
otherwise, if the file has exactly one column, the runner uses that column and renames it to the target name

Target column resolution

The runner resolves the target column in this order:

PipelineRunConfig.target_column or CLI --target-column
target.column
"y"

Use the CLI override only when you want to change how the runner reads the CSV. Keep it consistent with target.column in YAML.

`fit`

fit controls Stage 20 fitting because the fit stage calls:

context.model.fit(X=context.X, y=context.y, progressbar=False)

The runner merges these CLI or PipelineRunConfig overrides onto the YAML fit block when they are provided:

draws
tune
chains
cores
random_seed

The public YAML schema currently supports these fit keys:

draws
tune
chains
cores
random_seed
target_accept
progressbar
compute_convergence_checks

Unknown fit keys are rejected when the YAML is loaded.

`effects`

effects is an optional list of additive effect specifications:

effects:
  - type: linear_trend
    prefix: trend
    n_changepoints: 8
  - type: weekly_fourier
    order: 3

The builder appends each effect to model.mu_effects in YAML order before calling build_model(...).

`holidays`

The holidays block is optional.

Supported keys used by the builder include:

Key	Meaning
`path`	Holiday CSV path
`enabled`	Set to `false` to disable holiday loading
`prefix`	Prefix for generated holiday effect coordinates
`countries`	Optional country filter for catalogue-style holiday CSV input

Example:

holidays:
  path: "holidays.csv"
  prefix: "holiday"

The CLI or PipelineRunConfig.holidays_path overrides holidays.path.

If you omit both path and the override but still configure holidays, Abacus falls back to the bundled abacus.data:holidays.csv.

`original_scale_vars`

Use original_scale_vars when you want specific contribution variables to be available on the original target scale:

original_scale_vars:
  - channel_contribution
  - y

The builder applies these through model.add_original_scale_contribution_variable(...) before fitting.

`inference_data`

inference_data.path is passed through to the YAML builder. If the file exists, Abacus attaches that InferenceData to the built model during Stage 00.

Important: the structured runner still executes Stage 20 and fits the model again. inference_data.path does not currently skip fitting.

`optimization`

Add an optimization block when you want Stage 70 to run. If this block is absent, Stage 70 is marked skipped.

The YAML builder validates this block when the config is loaded. The required scalar fields below must be present, and unknown top-level optimization keys are rejected.

Required keys:

optimization:
  start_date: "2024-11-11"
  end_date: "2025-01-27"
  total_budget: 1289000000.0

Optional keys read by Stage 70:

Key	Default	Meaning
`response_variable`	`total_media_contribution_original_scale`	Optimisation objective variable
`budget_distribution_over_period`	None	Time weights over the optimisation window
`budget_bounds`	Derived or default	Explicit spend bounds
`spend_constraint_lower`	`0.3` when deriving bounds	Relative lower bound around scaled reference spend
`spend_constraint_upper`	`0.3` when deriving bounds	Relative upper bound around scaled reference spend
`default_constraints`	`true`	Whether to add the default equality budget constraint
`noise_level`	`0.001`	Noise level for simulated response samples
`include_last_observations`	`false`	Whether posterior predictive sampling includes trailing observed rows
`include_carryover`	`true`	Whether simulated response sampling extends the window for carryover

Important budget-unit note

In the structured pipeline, optimization.total_budget is passed straight to PanelBudgetOptimizerWrapper.optimize_budget(...).

That means Stage 70 uses the wrapper’s per-period spend contract, not the scenario planner’s total-horizon spend contract.

See Budget Optimisation.

Xarray-like optimisation values in YAML

For panel bounds or time distributions, use the xarray-like mapping shape that Stage 70 expects:

optimization:
  start_date: "2025-02-03"
  end_date: "2025-02-24"
  total_budget: 100000.0
  budget_distribution_over_period:
    values:
      - [[0.25, 0.25], [0.25, 0.25]]
      - [[0.25, 0.25], [0.25, 0.25]]
      - [[0.25, 0.25], [0.25, 0.25]]
      - [[0.25, 0.25], [0.25, 0.25]]
    dims: ["date", "geo", "channel"]
    coords:
      date: [0, 1, 2, 3]
      geo: ["UK", "FR"]
      channel: ["channel_1", "channel_2"]

The same shape works for budget_bounds, but with an additional "bound" dimension containing "lower" and "upper".

`diagnostics`

Use the optional diagnostics block when you want Stage 50 to use different warn/fail thresholds than the retained defaults.

diagnostics:
  thresholds:
    design_max_vif:
      warn: 8.0
      fail: 12.0
    mcmc_max_rhat:
      warn: 1.02
      fail: 1.08

Supported threshold keys:

design_max_vif
design_condition_number
mcmc_divergence_count
mcmc_max_rhat
mcmc_min_ess_bulk
mcmc_bfmi_min
predictive_nrmse
residual_ljung_box_p
residual_max_abs_acf

Validation rules:

upper-bound checks require warn <= fail
lower-bound checks require warn >= fail
omit the block entirely to use the built-in defaults

This block affects only the structured runner. It is stripped before Stage 00 model build so the public MMM YAML schema remains unchanged.

`validation`

Use the optional validation block when you want Stage 35 blocked holdout scoring.

validation:
  enabled: true
  holdout_observations: 8
  include_last_observations: true
  coverage_levels: [0.5, 0.8, 0.94]
  sampler:
    draws: 500
    tune: 500
    chains: 2
    cores: 2
    random_seed: 42

Supported keys:

Key	Meaning
`enabled`	Set to `false` to skip Stage 35 while keeping the stage in the manifest
`holdout_observations`	Number of unique dates to reserve for the blocked holdout window
`include_last_observations`	Keep lag history for carryover-sensitive holdout scoring
`coverage_levels`	Coverage levels reported in Phase 10; use the fixed `50`, `80`, and `94` percent defaults
`sampler`	Optional validation-only sampler overrides for the train-window refit

Phase 10 reports coverage as coverage_50, coverage_80, and coverage_94. Keep those defaults unless the implementation and tests are updated together.

The validation stage builds a clean train-window model for holdout scoring and ignores inference_data.path so the refit does not inherit attached posterior state from Stage 00.

Override precedence

For the runner, precedence is:

Setting	Higher precedence	Lower precedence
Combined dataset path	`dataset_path` / `--dataset-path`	`data.dataset_path`
Split CSV paths	`x_path`, `y_path` / `--x-path`, `--y-path`	`data.x_path`, `data.y_path`
Holiday CSV path	`holidays_path` / `--holidays-path`	`holidays.path`
Sampler settings	`PipelineRunConfig` or CLI overrides	`fit`
Target column for CSV loading	`target_column` / `--target-column`	`target.column`, then `"y"`
Diagnostics thresholds	`diagnostics.thresholds`	retained Stage 50 defaults

Common pitfalls

Using Parquet paths in the pipeline data block. The runner data loader reads CSV only.
Providing only one of data.x_path or data.y_path.
Treating optimization.total_budget as total horizon spend instead of per-period spend.
Assuming diagnostics is part of the public MMM builder schema. It is a runner-only block.
Assuming inference_data.path skips Stage 20 fitting. It does not.
Forgetting that relative paths are resolved from the YAML file directory, not from the shell working directory.