YAML Configuration
The pipeline runner reads the same YAML model specification used by
build_mmm_from_yaml(...), then adds a small set of runner-specific conventions
for data loading, optional blocked holdout validation, and Stage 70
optimisation.
This page documents the keys that the runner actually consumes.
Root keys
| Key | Required | Used for |
|---|---|---|
data |
Usually | Resolve dataset paths when you do not pass dataset_path, x_path, or y_path through PipelineRunConfig |
target |
Yes | Define the target column and business target type |
dimensions |
No | Declare panel-dimension columns such as geo or brand |
media |
Yes | Define channel/control columns and transform types |
scaling |
No | Configure target/channel scaling rules |
effects |
No | Append additive effects in YAML order before build_model(...) |
priors |
No | Override model-level priors and prefixed transform priors |
fit |
No | Default sampler settings for Stage 20 fitting |
holidays |
No | Add holiday events before model build |
original_scale_vars |
No | Add original-scale contribution variables before fitting |
inference_data |
No | Attach existing InferenceData when the file exists |
validation |
No | Enable optional Stage 35 blocked holdout validation |
optimization |
No | Enable Stage 70 budget optimisation |
diagnostics |
No | Override Stage 50 runner diagnostics thresholds |
Minimal runner config
Relative paths in YAML are resolved relative to the YAML file’s directory.
diagnostics is runner-only. The structured pipeline reads it, but
build_mmm_from_yaml(...) still validates only the public MMM model schema.
validation is also runner-only. The structured pipeline reads it for Stage 35
blocked holdout scoring, but the public MMM YAML builder never sees it.
Core modeling blocks
The runner always builds a PanelMMM, so the public YAML no longer exposes a
model.class field. Instead, it reads:
data.date_columntarget.columntarget.typemedia.channelsmedia.controls, if anydimensions.panel, if anymedia.adstockmedia.saturationfit
data
The runner loads data before building the model. It supports two CSV layouts.
Combined dataset
The runner reads the CSV, removes the target column from X, and uses that
column as y.
Separate feature and target files
When loading y_path:
- if the configured target column exists, the runner uses that column
- otherwise, if the file has exactly one column, the runner uses that column and renames it to the target name
Target column resolution
The runner resolves the target column in this order:
PipelineRunConfig.target_columnor CLI--target-columntarget.column"y"
Use the CLI override only when you want to change how the runner reads the CSV.
Keep it consistent with target.column in YAML.
fit
fit controls Stage 20 fitting because the fit stage
calls:
The runner merges these CLI or PipelineRunConfig overrides onto the YAML
fit block when they are provided:
drawstunechainscoresrandom_seed
The public YAML schema currently supports these fit keys:
drawstunechainscoresrandom_seedtarget_acceptprogressbarcompute_convergence_checks
Unknown fit keys are rejected when the YAML is loaded.
effects
effects is an optional list of additive effect specifications:
The builder appends each effect to model.mu_effects in YAML order before
calling build_model(...).
holidays
The holidays block is optional.
Supported keys used by the builder include:
| Key | Meaning |
|---|---|
path |
Holiday CSV path |
enabled |
Set to false to disable holiday loading |
prefix |
Prefix for generated holiday effect coordinates |
countries |
Optional country filter for catalogue-style holiday CSV input |
Example:
The CLI or PipelineRunConfig.holidays_path overrides holidays.path.
If you omit both path and the override but still configure holidays,
Abacus falls back to the bundled abacus.data:holidays.csv.
original_scale_vars
Use original_scale_vars when you want specific contribution variables to be
available on the original target scale:
The builder applies these through
model.add_original_scale_contribution_variable(...) before fitting.
inference_data
inference_data.path is passed through to the YAML builder. If the file exists, Abacus
attaches that InferenceData to the built model during Stage 00.
Important: the structured runner still executes Stage 20 and fits the model
again. inference_data.path does not currently skip fitting.
optimization
Add an optimization block when you want Stage 70 to run. If this block is
absent, Stage 70 is marked skipped.
The YAML builder validates this block when the config is loaded. The required
scalar fields below must be present, and unknown top-level optimization keys
are rejected.
Required keys:
Optional keys read by Stage 70:
| Key | Default | Meaning |
|---|---|---|
response_variable |
total_media_contribution_original_scale |
Optimisation objective variable |
budget_distribution_over_period |
None | Time weights over the optimisation window |
budget_bounds |
Derived or default | Explicit spend bounds |
spend_constraint_lower |
0.3 when deriving bounds |
Relative lower bound around scaled reference spend |
spend_constraint_upper |
0.3 when deriving bounds |
Relative upper bound around scaled reference spend |
default_constraints |
true |
Whether to add the default equality budget constraint |
noise_level |
0.001 |
Noise level for simulated response samples |
include_last_observations |
false |
Whether posterior predictive sampling includes trailing observed rows |
include_carryover |
true |
Whether simulated response sampling extends the window for carryover |
Important budget-unit note
In the structured pipeline, optimization.total_budget is passed straight to
PanelBudgetOptimizerWrapper.optimize_budget(...).
That means Stage 70 uses the wrapper’s per-period spend contract, not the scenario planner’s total-horizon spend contract.
See Budget Optimisation.
Xarray-like optimisation values in YAML
For panel bounds or time distributions, use the xarray-like mapping shape that Stage 70 expects:
The same shape works for budget_bounds, but with an additional "bound"
dimension containing "lower" and "upper".
diagnostics
Use the optional diagnostics block when you want Stage 50 to use different
warn/fail thresholds than the retained defaults.
Supported threshold keys:
design_max_vifdesign_condition_numbermcmc_divergence_countmcmc_max_rhatmcmc_min_ess_bulkmcmc_bfmi_minpredictive_nrmseresidual_ljung_box_presidual_max_abs_acf
Validation rules:
- upper-bound checks require
warn <= fail - lower-bound checks require
warn >= fail - omit the block entirely to use the built-in defaults
This block affects only the structured runner. It is stripped before Stage 00 model build so the public MMM YAML schema remains unchanged.
validation
Use the optional validation block when you want Stage 35 blocked holdout
scoring.
Supported keys:
| Key | Meaning |
|---|---|
enabled |
Set to false to skip Stage 35 while keeping the stage in the manifest |
holdout_observations |
Number of unique dates to reserve for the blocked holdout window |
include_last_observations |
Keep lag history for carryover-sensitive holdout scoring |
coverage_levels |
Coverage levels reported in Phase 10; use the fixed 50, 80, and 94 percent defaults |
sampler |
Optional validation-only sampler overrides for the train-window refit |
Phase 10 reports coverage as coverage_50, coverage_80, and
coverage_94. Keep those defaults unless the implementation and tests are
updated together.
The validation stage builds a clean train-window model for holdout scoring and
ignores inference_data.path so the refit does not inherit attached posterior
state from Stage 00.
Override precedence
For the runner, precedence is:
| Setting | Higher precedence | Lower precedence |
|---|---|---|
| Combined dataset path | dataset_path / --dataset-path |
data.dataset_path |
| Split CSV paths | x_path, y_path / --x-path, --y-path |
data.x_path, data.y_path |
| Holiday CSV path | holidays_path / --holidays-path |
holidays.path |
| Sampler settings | PipelineRunConfig or CLI overrides |
fit |
| Target column for CSV loading | target_column / --target-column |
target.column, then "y" |
| Diagnostics thresholds | diagnostics.thresholds |
retained Stage 50 defaults |
Common pitfalls
- Using Parquet paths in the pipeline data block. The runner data loader reads CSV only.
- Providing only one of
data.x_pathordata.y_path. - Treating
optimization.total_budgetas total horizon spend instead of per-period spend. - Assuming
diagnosticsis part of the public MMM builder schema. It is a runner-only block. - Assuming
inference_data.pathskips Stage 20 fitting. It does not. - Forgetting that relative paths are resolved from the YAML file directory, not from the shell working directory.