Getting Started

This section helps you install Abacus and run your first model.

Start here if you want to:

set up a local environment from this repository
fit PanelMMM directly from Python
build a model from YAML
run the structured pipeline against one of the bundled demo configs

Installation

These instructions assume you are working from a local checkout of the Abacus repository.

Prerequisites

Item	Notes
Python	The package requires Python 3.11 or later. The repo development environment uses Python 3.12.
Local checkout	Install from the repository root, not from a published package index.
Writable temp/cache directory	Useful for PyTensor compiledir and local verification commands.

Recommended setup: Conda + editable install

This is the supported local development path for the repository.

conda env create -f environment.yml
conda activate abacus-dev
python3 -m pip install -e .

This gives you:

the repo-managed development environment from environment.yml
an editable install, so local code changes are picked up immediately

Minimal pip install from source

If you do not want the full Conda environment, you can still install Abacus directly from the repository root.

Standard install

python3 -m pip install .

Editable install

python3 -m pip install -e .

Use the editable install if you are changing code, configs, or docs locally.

Optional extras

Abacus defines a small set of optional extras in pyproject.toml.

Extra	Install command	Use when you need
`lint`	`python3 -m pip install .[lint]`	Ruff, MyPy, and related local linting tools
`test`	`python3 -m pip install .[test]`	Pytest and test-only dependencies
`planner`	`python3 -m pip install -e ".[planner]"`	Dash and Plotly for the scenario planner surfaces

If you created the environment from environment.yml, most development dependencies are already present.

Verify the install

A quick smoke check from the repository root:

python3 -c "from abacus.mmm.panel import PanelMMM; print(PanelMMM.__name__)"

For a real end-to-end verification path, use the repo smoke target:

make smoke_mmm

If you are working on the repo itself, the main local verification commands are:

make test
make verify_local
make verify_package

Runtime defaults for restricted environments

Some local runs need writable cache directories. If you hit PyTensor compiledir or cache-permission issues, export the same defaults used by the repo verification scripts:

export PYTENSOR_FLAGS="base_compiledir=/tmp/pytensor,linker=py"
export JAX_PLATFORMS=cpu
export XDG_CACHE_HOME=/tmp

Next steps

Read Quickstart: Python API if you want to fit a model directly from pandas data.
Read Quickstart: YAML Builder if you want configuration-driven model construction.
Read Quickstart: Pipeline Runner if you want a full structured run with staged artefacts.

Quickstart: Python API

This page shows the fastest direct path from a pandas dataset to a fitted PanelMMM.

If you have not prepared your dataset yet, read Data Preparation first.

Load a dataset

The repository includes bundled demo datasets under data/demo/. The timeseries bundle is the simplest starting point because it has no extra panel dimensions.

import pandas as pd

dataset = pd.read_csv("data/demo/timeseries/dataset.csv")
dataset["date"] = pd.to_datetime(dataset["date"])

X = dataset.drop(columns=["revenue"])
y = dataset["revenue"].rename("revenue")

Construct `PanelMMM`

from abacus.mmm import GeometricAdstock, LogisticSaturation
from abacus.mmm.panel import PanelMMM

mmm = PanelMMM(
    date_column="date",
    target_column="revenue",
    channel_columns=[
        "channel_1",
        "channel_2",
        "channel_3",
        "channel_4",
        "channel_5",
        "channel_6",
    ],
    yearly_seasonality=2,
    adstock=GeometricAdstock(l_max=4),
    saturation=LogisticSaturation(),
)

This example uses a plain timeseries. If your dataset has panel dimensions such as geo or brand, add them with dims=(...) and keep those columns in X.

Fit the model

You can call fit() directly. If the model graph has not been built yet, Abacus builds it for you.

idata = mmm.fit(
    X,
    y,
    draws=200,
    tune=200,
    chains=2,
    cores=2,
    progressbar=False,
    compute_convergence_checks=False,
    random_seed=42,
)

fit() returns an arviz.InferenceData object and also stores it on the model instance as mmm.idata.

Prior and posterior predictive checks

You can sample prior predictive draws before fitting:

prior = mmm.sample_prior_predictive(
    X=X,
    y=y,
    samples=50,
    random_seed=42,
)

After fitting, you can sample posterior predictive draws:

post = mmm.sample_posterior_predictive(
    X=X,
    progressbar=False,
    random_seed=42,
)

By default, this also stores posterior predictive draws on mmm.idata.

When to call `build_model()`

Call build_model(X, y) explicitly when you want to inspect or modify the PyMC graph before sampling.

For example, you might build first so that you can add stored original-scale deterministics:

mmm.build_model(X, y)
mmm.add_original_scale_contribution_variable(
    var=["channel_contribution", "y"]
)

After that, fit the already-built model:

idata = mmm.fit(
    X,
    y,
    draws=200,
    tune=200,
    chains=2,
    cores=2,
    progressbar=False,
    compute_convergence_checks=False,
    random_seed=42,
)

Basic outputs

After fitting, common next steps are:

mmm.save("mmm.nc")
fig, axes = mmm.plot.posterior_predictive()

You can also inspect:

mmm.posterior
mmm.posterior_predictive
mmm.summary
mmm.diagnostics

Next steps

Read Quickstart: YAML Builder if you want to move model configuration into YAML.
Read Model Fitting for fitting, save/load, and predictive-check workflows in more detail.

Quickstart: YAML Builder

Use the YAML builder when you want the model specification to live in a config file instead of Python code.

The builder entry point is:

from abacus.mmm.builders.yaml import build_mmm_from_yaml

Smallest useful workflow

The bundled demo config at data/demo/timeseries/config.yml is a working starting point. It already points to a combined dataset with data.dataset_path.

import pandas as pd

from abacus.mmm.builders.yaml import build_mmm_from_yaml

dataset = pd.read_csv("data/demo/timeseries/dataset.csv")
X = dataset.drop(columns=["revenue"])
y = dataset["revenue"].rename("revenue")

mmm = build_mmm_from_yaml(
    "data/demo/timeseries/config.yml",
    X=X,
    y=y,
)

build_mmm_from_yaml(...) returns a PanelMMM instance with the PyMC graph already built.

Minimal config structure

At minimum, the YAML config needs the flow-oriented blocks that describe the dataset, target, and media specification directly.

data:
  dataset_path: dataset.csv
  date_column: date

target:
  column: revenue
  type: revenue

media:
  channels:
    - channel_1
    - channel_2
    - channel_3
  adstock:
    type: geometric
    l_max: 4
  saturation:
    type: logistic

fit:
  draws: 1000
  tune: 1000
  chains: 4
  random_seed: 42

How data loading works

The builder supports two data-loading patterns.

Pattern	What you provide
Combined dataset	`data.dataset_path` in YAML, or `X` and `y` already split in Python
Separate files	`data.x_path` and `data.y_path` in YAML

If you use data.dataset_path, the target column must be present in that file. The builder splits it out into X and y before building the model.

The builder also normalises X[date_column] with pd.to_datetime(...) after loading the data.

Configured relative paths are resolved relative to the YAML file location.

Fit after building

The builder does not fit the model for you. Fit it in the usual way:

idata = mmm.fit(
    X,
    y,
    draws=200,
    tune=200,
    chains=2,
    cores=2,
    progressbar=False,
    compute_convergence_checks=False,
    random_seed=42,
)

If you rely on data.dataset_path, either split the combined dataset in Python before fitting, or load it once in Python and pass the same X and y into both build_mmm_from_yaml(...) and fit(...).

Optional top-level YAML blocks

The builder recognises several optional top-level sections in addition to data, target, and media.

Key	Purpose
`dimensions`	Panel-dimension columns such as `geo` or `brand`
`scaling`	Optional scaling rules for target and channels
`effects`	Additive effects to attach before model build
`priors`	Model-level priors passed into `PanelMMM`
`fit`	Sampler defaults used by the runner or by Python overrides
`holidays`	Holiday/event configuration applied before build
`original_scale_vars`	Add original-scale deterministic variables after build
`inference_data`	Attach existing inference data if the file exists
`calibration`	Apply calibration steps after the model is built

Override config values from Python

Use model_kwargs when you want to keep most settings in YAML but override a subset from Python.

For example, you can override the fit config for a lighter quickstart run:

mmm = build_mmm_from_yaml(
    "data/demo/timeseries/config.yml",
    X=X,
    y=y,
    model_kwargs={
        "sampler_config": {
            "draws": 200,
            "tune": 200,
            "chains": 2,
            "cores": 2,
            "progressbar": False,
            "compute_convergence_checks": False,
            "random_seed": 42,
        }
    },
)

model_kwargs takes precedence over the YAML defaults.

Next steps

Read Quickstart: Pipeline Runner if you want staged artefacts and manifests instead of an in-memory fit only.
Read Data Preparation for dataset column and layout requirements.

Quickstart: Pipeline Runner

Use the pipeline runner when you want a full staged run instead of only an in-memory model fit.

The runner writes:

a run manifest
copied and resolved config files
fitted model artefacts
posterior predictive assessment outputs
decomposition, diagnostics, and response-curve artefacts

Fastest first run: bundled demo

From the repository root, the quickest way to see a real structured run is the demo launcher:

python3 runme.py --demo timeseries

Other bundled demos are:

geo_panel
geo_brand_panel

List them explicitly with:

python3 runme.py --list-demos

runme.py is a convenience wrapper around the structured pipeline. It resolves the demo config under data/demo/<demo_name>/config.yml and runs the pipeline for you.

Run the pipeline from Python

The direct Python API is:

from pathlib import Path

from abacus.pipeline import PipelineRunConfig, run_pipeline

result = run_pipeline(
    PipelineRunConfig(
        config_path=Path("data/demo/geo_panel/config.yml"),
        output_dir=Path("results"),
        run_name="geo_panel_quickstart",
        prior_samples=10,
        draws=200,
        tune=200,
        chains=2,
        cores=2,
        random_seed=42,
        curve_samples=50,
        curve_points=50,
    )
)

print(result.run_dir)
print(result.manifest_path)

If the YAML config already contains data.dataset_path, you do not need to pass dataset_path again.

Run the thin CLI directly

The pipeline also exposes a thin CLI in abacus.pipeline.runner:

python3 -m abacus.pipeline.runner \
  --config data/demo/geo_panel/config.yml \
  --output-dir results \
  --run-name geo_panel_quickstart \
  --prior-samples 10 \
  --draws 200 \
  --tune 200 \
  --chains 2 \
  --cores 2 \
  --random-seed 42 \
  --curve-samples 50 \
  --curve-points 50

The CLI prints the final run directory when the pipeline completes.

Override data paths

Use one of these patterns:

Pattern	Arguments
Combined dataset override	`dataset_path=` in Python or `--dataset-path` in the CLI
Separate feature and target files	`x_path=` and `y_path=` in Python or `--x-path` and `--y-path` in the CLI
Target column override	`target_column=` in Python or `--target-column` in the CLI

Configured relative paths are resolved relative to the YAML config directory.

If you want Stage 50 to use different warn/fail cutoffs, add a runner-only diagnostics.thresholds block to the YAML. See YAML Configuration.

What you get back

run_pipeline(...) returns a PipelineRunResult with:

run_dir
manifest_path

The output directory contains stage folders such as:

00_run_metadata
20_model_fit
30_model_assessment
50_diagnostics
60_response_curves

60_response_curves now includes three complementary curve families:

saturation-only transformation artefacts
forward-pass direct contribution artefacts built from scaled observed history
adstock carryover artefacts

When to use the runner

Choose the runner when you want:

a reproducible run directory on disk
structured metadata and manifest files
staged artefacts for diagnostics and reporting
a config-driven workflow for repeated runs

If you only need to fit a model interactively in a notebook or script, start with Quickstart: Python API or Quickstart: YAML Builder.

Getting Started

Pages

Subsections of Getting Started

Installation

Prerequisites

Recommended setup: Conda + editable install

Minimal pip install from source

Standard install

Editable install

Optional extras

Verify the install

Runtime defaults for restricted environments

Next steps

Quickstart: Python API

Load a dataset

Construct PanelMMM

Fit the model

Prior and posterior predictive checks

When to call build_model()

Basic outputs

Next steps

Quickstart: YAML Builder

Smallest useful workflow

Minimal config structure

How data loading works

Fit after building

Optional top-level YAML blocks

Override config values from Python

Next steps

Quickstart: Pipeline Runner

Fastest first run: bundled demo

Run the pipeline from Python

Run the thin CLI directly

Override data paths

What you get back

When to use the runner

Construct `PanelMMM`

When to call `build_model()`