Quickstart: YAML Builder

Use the YAML builder when you want the model specification to live in a config file instead of Python code.

The builder entry point is:

from abacus.mmm.builders.yaml import build_mmm_from_yaml

Smallest useful workflow

The bundled demo config at data/demo/timeseries/config.yml is a working starting point. It already points to a combined dataset with data.dataset_path.

import pandas as pd

from abacus.mmm.builders.yaml import build_mmm_from_yaml

dataset = pd.read_csv("data/demo/timeseries/dataset.csv")
X = dataset.drop(columns=["revenue"])
y = dataset["revenue"].rename("revenue")

mmm = build_mmm_from_yaml(
    "data/demo/timeseries/config.yml",
    X=X,
    y=y,
)

build_mmm_from_yaml(...) returns a PanelMMM instance with the PyMC graph already built.

Minimal config structure

At minimum, the YAML config needs the flow-oriented blocks that describe the dataset, target, and media specification directly.

data:
  dataset_path: dataset.csv
  date_column: date

target:
  column: revenue
  type: revenue

media:
  channels:
    - channel_1
    - channel_2
    - channel_3
  adstock:
    type: geometric
    l_max: 4
  saturation:
    type: logistic

fit:
  draws: 1000
  tune: 1000
  chains: 4
  random_seed: 42

How data loading works

The builder supports two data-loading patterns.

Pattern	What you provide
Combined dataset	`data.dataset_path` in YAML, or `X` and `y` already split in Python
Separate files	`data.x_path` and `data.y_path` in YAML

If you use data.dataset_path, the target column must be present in that file. The builder splits it out into X and y before building the model.

The builder also normalises X[date_column] with pd.to_datetime(...) after loading the data.

Configured relative paths are resolved relative to the YAML file location.

Fit after building

The builder does not fit the model for you. Fit it in the usual way:

idata = mmm.fit(
    X,
    y,
    draws=200,
    tune=200,
    chains=2,
    cores=2,
    progressbar=False,
    compute_convergence_checks=False,
    random_seed=42,
)

If you rely on data.dataset_path, either split the combined dataset in Python before fitting, or load it once in Python and pass the same X and y into both build_mmm_from_yaml(...) and fit(...).

Optional top-level YAML blocks

The builder recognises several optional top-level sections in addition to data, target, and media.

Key	Purpose
`dimensions`	Panel-dimension columns such as `geo` or `brand`
`scaling`	Optional scaling rules for target and channels
`effects`	Additive effects to attach before model build
`priors`	Model-level priors passed into `PanelMMM`
`fit`	Sampler defaults used by the runner or by Python overrides
`holidays`	Holiday/event configuration applied before build
`original_scale_vars`	Add original-scale deterministic variables after build
`inference_data`	Attach existing inference data if the file exists
`calibration`	Apply calibration steps after the model is built

Override config values from Python

Use model_kwargs when you want to keep most settings in YAML but override a subset from Python.

For example, you can override the fit config for a lighter quickstart run:

mmm = build_mmm_from_yaml(
    "data/demo/timeseries/config.yml",
    X=X,
    y=y,
    model_kwargs={
        "sampler_config": {
            "draws": 200,
            "tune": 200,
            "chains": 2,
            "cores": 2,
            "progressbar": False,
            "compute_convergence_checks": False,
            "random_seed": 42,
        }
    },
)

model_kwargs takes precedence over the YAML defaults.

Next steps

Read Quickstart: Pipeline Runner if you want staged artefacts and manifests instead of an in-memory fit only.
Read Data Preparation for dataset column and layout requirements.