YAML Configuration

The pipeline runner reads the same YAML model specification used by build_mmm_from_yaml(...), then adds a small set of runner-specific conventions for data loading, optional blocked holdout validation, and Stage 70 optimisation.

This page documents the keys that the runner actually consumes.

Root keys

Key Required Used for
data Usually Resolve dataset paths when you do not pass dataset_path, x_path, or y_path through PipelineRunConfig
target Yes Define the target column and business target type
dimensions No Declare panel-dimension columns such as geo or brand
media Yes Define channel/control columns and transform types
scaling No Configure target/channel scaling rules
effects No Append additive effects in YAML order before build_model(...)
priors No Override model-level priors and prefixed transform priors
fit No Default sampler settings for Stage 20 fitting
holidays No Add holiday events before model build
original_scale_vars No Add original-scale contribution variables before fitting
inference_data No Attach existing InferenceData when the file exists
validation No Enable optional Stage 35 blocked holdout validation
optimization No Enable Stage 70 budget optimisation
diagnostics No Override Stage 50 runner diagnostics thresholds

Minimal runner config

data:
  dataset_path: dataset.csv
  date_column: date

target:
  column: revenue
  type: revenue

dimensions:
  panel: [geo]

media:
  channels: [channel_1, channel_2]
  adstock:
    type: geometric
    l_max: 4
  saturation:
    type: logistic

fit:
  draws: 1000
  tune: 1000
  chains: 4
  cores: 4
  random_seed: 42

Relative paths in YAML are resolved relative to the YAML file’s directory.

diagnostics is runner-only. The structured pipeline reads it, but build_mmm_from_yaml(...) still validates only the public MMM model schema.

validation is also runner-only. The structured pipeline reads it for Stage 35 blocked holdout scoring, but the public MMM YAML builder never sees it.

Core modeling blocks

The runner always builds a PanelMMM, so the public YAML no longer exposes a model.class field. Instead, it reads:

  • data.date_column
  • target.column
  • target.type
  • media.channels
  • media.controls, if any
  • dimensions.panel, if any
  • media.adstock
  • media.saturation
  • fit

data

The runner loads data before building the model. It supports two CSV layouts.

Combined dataset

data:
  dataset_path: "dataset.csv"

The runner reads the CSV, removes the target column from X, and uses that column as y.

Separate feature and target files

data:
  x_path: "X.csv"
  y_path: "y.csv"

When loading y_path:

  • if the configured target column exists, the runner uses that column
  • otherwise, if the file has exactly one column, the runner uses that column and renames it to the target name

Target column resolution

The runner resolves the target column in this order:

  1. PipelineRunConfig.target_column or CLI --target-column
  2. target.column
  3. "y"

Use the CLI override only when you want to change how the runner reads the CSV. Keep it consistent with target.column in YAML.

fit

fit controls Stage 20 fitting because the fit stage calls:

context.model.fit(X=context.X, y=context.y, progressbar=False)

The runner merges these CLI or PipelineRunConfig overrides onto the YAML fit block when they are provided:

  • draws
  • tune
  • chains
  • cores
  • random_seed

The public YAML schema currently supports these fit keys:

  • draws
  • tune
  • chains
  • cores
  • random_seed
  • target_accept
  • progressbar
  • compute_convergence_checks

Unknown fit keys are rejected when the YAML is loaded.

effects

effects is an optional list of additive effect specifications:

effects:
  - type: linear_trend
    prefix: trend
    n_changepoints: 8
  - type: weekly_fourier
    order: 3

The builder appends each effect to model.mu_effects in YAML order before calling build_model(...).

holidays

The holidays block is optional.

Supported keys used by the builder include:

Key Meaning
path Holiday CSV path
enabled Set to false to disable holiday loading
prefix Prefix for generated holiday effect coordinates
countries Optional country filter for catalogue-style holiday CSV input

Example:

holidays:
  path: "holidays.csv"
  prefix: "holiday"

The CLI or PipelineRunConfig.holidays_path overrides holidays.path.

If you omit both path and the override but still configure holidays, Abacus falls back to the bundled abacus.data:holidays.csv.

original_scale_vars

Use original_scale_vars when you want specific contribution variables to be available on the original target scale:

original_scale_vars:
  - channel_contribution
  - y

The builder applies these through model.add_original_scale_contribution_variable(...) before fitting.

inference_data

inference_data.path is passed through to the YAML builder. If the file exists, Abacus attaches that InferenceData to the built model during Stage 00.

Important: the structured runner still executes Stage 20 and fits the model again. inference_data.path does not currently skip fitting.

optimization

Add an optimization block when you want Stage 70 to run. If this block is absent, Stage 70 is marked skipped.

The YAML builder validates this block when the config is loaded. The required scalar fields below must be present, and unknown top-level optimization keys are rejected.

Required keys:

optimization:
  start_date: "2024-11-11"
  end_date: "2025-01-27"
  total_budget: 1289000000.0

Optional keys read by Stage 70:

Key Default Meaning
response_variable total_media_contribution_original_scale Optimisation objective variable
budget_distribution_over_period None Time weights over the optimisation window
budget_bounds Derived or default Explicit spend bounds
spend_constraint_lower 0.3 when deriving bounds Relative lower bound around scaled reference spend
spend_constraint_upper 0.3 when deriving bounds Relative upper bound around scaled reference spend
default_constraints true Whether to add the default equality budget constraint
noise_level 0.001 Noise level for simulated response samples
include_last_observations false Whether posterior predictive sampling includes trailing observed rows
include_carryover true Whether simulated response sampling extends the window for carryover

Important budget-unit note

In the structured pipeline, optimization.total_budget is passed straight to PanelBudgetOptimizerWrapper.optimize_budget(...).

That means Stage 70 uses the wrapper’s per-period spend contract, not the scenario planner’s total-horizon spend contract.

See Budget Optimisation.

Xarray-like optimisation values in YAML

For panel bounds or time distributions, use the xarray-like mapping shape that Stage 70 expects:

optimization:
  start_date: "2025-02-03"
  end_date: "2025-02-24"
  total_budget: 100000.0
  budget_distribution_over_period:
    values:
      - [[0.25, 0.25], [0.25, 0.25]]
      - [[0.25, 0.25], [0.25, 0.25]]
      - [[0.25, 0.25], [0.25, 0.25]]
      - [[0.25, 0.25], [0.25, 0.25]]
    dims: ["date", "geo", "channel"]
    coords:
      date: [0, 1, 2, 3]
      geo: ["UK", "FR"]
      channel: ["channel_1", "channel_2"]

The same shape works for budget_bounds, but with an additional "bound" dimension containing "lower" and "upper".

diagnostics

Use the optional diagnostics block when you want Stage 50 to use different warn/fail thresholds than the retained defaults.

diagnostics:
  thresholds:
    design_max_vif:
      warn: 8.0
      fail: 12.0
    mcmc_max_rhat:
      warn: 1.02
      fail: 1.08

Supported threshold keys:

  • design_max_vif
  • design_condition_number
  • mcmc_divergence_count
  • mcmc_max_rhat
  • mcmc_min_ess_bulk
  • mcmc_bfmi_min
  • predictive_nrmse
  • residual_ljung_box_p
  • residual_max_abs_acf

Validation rules:

  • upper-bound checks require warn <= fail
  • lower-bound checks require warn >= fail
  • omit the block entirely to use the built-in defaults

This block affects only the structured runner. It is stripped before Stage 00 model build so the public MMM YAML schema remains unchanged.

validation

Use the optional validation block when you want Stage 35 blocked holdout scoring.

validation:
  enabled: true
  holdout_observations: 8
  include_last_observations: true
  coverage_levels: [0.5, 0.8, 0.94]
  sampler:
    draws: 500
    tune: 500
    chains: 2
    cores: 2
    random_seed: 42

Supported keys:

Key Meaning
enabled Set to false to skip Stage 35 while keeping the stage in the manifest
holdout_observations Number of unique dates to reserve for the blocked holdout window
include_last_observations Keep lag history for carryover-sensitive holdout scoring
coverage_levels Coverage levels reported in Phase 10; use the fixed 50, 80, and 94 percent defaults
sampler Optional validation-only sampler overrides for the train-window refit

Phase 10 reports coverage as coverage_50, coverage_80, and coverage_94. Keep those defaults unless the implementation and tests are updated together.

The validation stage builds a clean train-window model for holdout scoring and ignores inference_data.path so the refit does not inherit attached posterior state from Stage 00.

Override precedence

For the runner, precedence is:

Setting Higher precedence Lower precedence
Combined dataset path dataset_path / --dataset-path data.dataset_path
Split CSV paths x_path, y_path / --x-path, --y-path data.x_path, data.y_path
Holiday CSV path holidays_path / --holidays-path holidays.path
Sampler settings PipelineRunConfig or CLI overrides fit
Target column for CSV loading target_column / --target-column target.column, then "y"
Diagnostics thresholds diagnostics.thresholds retained Stage 50 defaults

Common pitfalls

  • Using Parquet paths in the pipeline data block. The runner data loader reads CSV only.
  • Providing only one of data.x_path or data.y_path.
  • Treating optimization.total_budget as total horizon spend instead of per-period spend.
  • Assuming diagnostics is part of the public MMM builder schema. It is a runner-only block.
  • Assuming inference_data.path skips Stage 20 fitting. It does not.
  • Forgetting that relative paths are resolved from the YAML file directory, not from the shell working directory.