Getting Started

This section helps you install Abacus and run your first model.

Start here if you want to:

  • set up a local environment from this repository
  • fit PanelMMM directly from Python
  • build a model from YAML
  • run the structured pipeline against one of the bundled demo configs

Pages

Subsections of Getting Started

Installation

These instructions assume you are working from a local checkout of the Abacus repository.

Prerequisites

Item Notes
Python The package requires Python 3.11 or later. The repo development environment uses Python 3.12.
Local checkout Install from the repository root, not from a published package index.
Writable temp/cache directory Useful for PyTensor compiledir and local verification commands.

This is the supported local development path for the repository.

conda env create -f environment.yml
conda activate abacus-dev
python3 -m pip install -e .

This gives you:

  • the repo-managed development environment from environment.yml
  • an editable install, so local code changes are picked up immediately

Minimal pip install from source

If you do not want the full Conda environment, you can still install Abacus directly from the repository root.

Standard install

python3 -m pip install .

Editable install

python3 -m pip install -e .

Use the editable install if you are changing code, configs, or docs locally.

Optional extras

Abacus defines a small set of optional extras in pyproject.toml.

Extra Install command Use when you need
lint python3 -m pip install .[lint] Ruff, MyPy, and related local linting tools
test python3 -m pip install .[test] Pytest and test-only dependencies
planner python3 -m pip install -e ".[planner]" Dash and Plotly for the scenario planner surfaces

If you created the environment from environment.yml, most development dependencies are already present.

Verify the install

A quick smoke check from the repository root:

python3 -c "from abacus.mmm.panel import PanelMMM; print(PanelMMM.__name__)"

For a real end-to-end verification path, use the repo smoke target:

make smoke_mmm

If you are working on the repo itself, the main local verification commands are:

make test
make verify_local
make verify_package

Runtime defaults for restricted environments

Some local runs need writable cache directories. If you hit PyTensor compiledir or cache-permission issues, export the same defaults used by the repo verification scripts:

export PYTENSOR_FLAGS="base_compiledir=/tmp/pytensor,linker=py"
export JAX_PLATFORMS=cpu
export XDG_CACHE_HOME=/tmp

Next steps

Quickstart: Python API

This page shows the fastest direct path from a pandas dataset to a fitted PanelMMM.

If you have not prepared your dataset yet, read Data Preparation first.

Load a dataset

The repository includes bundled demo datasets under data/demo/. The timeseries bundle is the simplest starting point because it has no extra panel dimensions.

import pandas as pd

dataset = pd.read_csv("data/demo/timeseries/dataset.csv")
dataset["date"] = pd.to_datetime(dataset["date"])

X = dataset.drop(columns=["revenue"])
y = dataset["revenue"].rename("revenue")

Construct PanelMMM

from abacus.mmm import GeometricAdstock, LogisticSaturation
from abacus.mmm.panel import PanelMMM

mmm = PanelMMM(
    date_column="date",
    target_column="revenue",
    channel_columns=[
        "channel_1",
        "channel_2",
        "channel_3",
        "channel_4",
        "channel_5",
        "channel_6",
    ],
    yearly_seasonality=2,
    adstock=GeometricAdstock(l_max=4),
    saturation=LogisticSaturation(),
)

This example uses a plain timeseries. If your dataset has panel dimensions such as geo or brand, add them with dims=(...) and keep those columns in X.

Fit the model

You can call fit() directly. If the model graph has not been built yet, Abacus builds it for you.

idata = mmm.fit(
    X,
    y,
    draws=200,
    tune=200,
    chains=2,
    cores=2,
    progressbar=False,
    compute_convergence_checks=False,
    random_seed=42,
)

fit() returns an arviz.InferenceData object and also stores it on the model instance as mmm.idata.

Prior and posterior predictive checks

You can sample prior predictive draws before fitting:

prior = mmm.sample_prior_predictive(
    X=X,
    y=y,
    samples=50,
    random_seed=42,
)

After fitting, you can sample posterior predictive draws:

post = mmm.sample_posterior_predictive(
    X=X,
    progressbar=False,
    random_seed=42,
)

By default, this also stores posterior predictive draws on mmm.idata.

When to call build_model()

Call build_model(X, y) explicitly when you want to inspect or modify the PyMC graph before sampling.

For example, you might build first so that you can add stored original-scale deterministics:

mmm.build_model(X, y)
mmm.add_original_scale_contribution_variable(
    var=["channel_contribution", "y"]
)

After that, fit the already-built model:

idata = mmm.fit(
    X,
    y,
    draws=200,
    tune=200,
    chains=2,
    cores=2,
    progressbar=False,
    compute_convergence_checks=False,
    random_seed=42,
)

Basic outputs

After fitting, common next steps are:

mmm.save("mmm.nc")
fig, axes = mmm.plot.posterior_predictive()

You can also inspect:

  • mmm.posterior
  • mmm.posterior_predictive
  • mmm.summary
  • mmm.diagnostics

Next steps

Quickstart: YAML Builder

Use the YAML builder when you want the model specification to live in a config file instead of Python code.

The builder entry point is:

from abacus.mmm.builders.yaml import build_mmm_from_yaml

Smallest useful workflow

The bundled demo config at data/demo/timeseries/config.yml is a working starting point. It already points to a combined dataset with data.dataset_path.

import pandas as pd

from abacus.mmm.builders.yaml import build_mmm_from_yaml

dataset = pd.read_csv("data/demo/timeseries/dataset.csv")
X = dataset.drop(columns=["revenue"])
y = dataset["revenue"].rename("revenue")

mmm = build_mmm_from_yaml(
    "data/demo/timeseries/config.yml",
    X=X,
    y=y,
)

build_mmm_from_yaml(...) returns a PanelMMM instance with the PyMC graph already built.

Minimal config structure

At minimum, the YAML config needs the flow-oriented blocks that describe the dataset, target, and media specification directly.

data:
  dataset_path: dataset.csv
  date_column: date

target:
  column: revenue
  type: revenue

media:
  channels:
    - channel_1
    - channel_2
    - channel_3
  adstock:
    type: geometric
    l_max: 4
  saturation:
    type: logistic

fit:
  draws: 1000
  tune: 1000
  chains: 4
  random_seed: 42

How data loading works

The builder supports two data-loading patterns.

Pattern What you provide
Combined dataset data.dataset_path in YAML, or X and y already split in Python
Separate files data.x_path and data.y_path in YAML

If you use data.dataset_path, the target column must be present in that file. The builder splits it out into X and y before building the model.

The builder also normalises X[date_column] with pd.to_datetime(...) after loading the data.

Configured relative paths are resolved relative to the YAML file location.

Fit after building

The builder does not fit the model for you. Fit it in the usual way:

idata = mmm.fit(
    X,
    y,
    draws=200,
    tune=200,
    chains=2,
    cores=2,
    progressbar=False,
    compute_convergence_checks=False,
    random_seed=42,
)

If you rely on data.dataset_path, either split the combined dataset in Python before fitting, or load it once in Python and pass the same X and y into both build_mmm_from_yaml(...) and fit(...).

Optional top-level YAML blocks

The builder recognises several optional top-level sections in addition to data, target, and media.

Key Purpose
dimensions Panel-dimension columns such as geo or brand
scaling Optional scaling rules for target and channels
effects Additive effects to attach before model build
priors Model-level priors passed into PanelMMM
fit Sampler defaults used by the runner or by Python overrides
holidays Holiday/event configuration applied before build
original_scale_vars Add original-scale deterministic variables after build
inference_data Attach existing inference data if the file exists
calibration Apply calibration steps after the model is built

Override config values from Python

Use model_kwargs when you want to keep most settings in YAML but override a subset from Python.

For example, you can override the fit config for a lighter quickstart run:

mmm = build_mmm_from_yaml(
    "data/demo/timeseries/config.yml",
    X=X,
    y=y,
    model_kwargs={
        "sampler_config": {
            "draws": 200,
            "tune": 200,
            "chains": 2,
            "cores": 2,
            "progressbar": False,
            "compute_convergence_checks": False,
            "random_seed": 42,
        }
    },
)

model_kwargs takes precedence over the YAML defaults.

Next steps

Quickstart: Pipeline Runner

Use the pipeline runner when you want a full staged run instead of only an in-memory model fit.

The runner writes:

  • a run manifest
  • copied and resolved config files
  • fitted model artefacts
  • posterior predictive assessment outputs
  • decomposition, diagnostics, and response-curve artefacts

Fastest first run: bundled demo

From the repository root, the quickest way to see a real structured run is the demo launcher:

python3 runme.py --demo timeseries

Other bundled demos are:

  • geo_panel
  • geo_brand_panel

List them explicitly with:

python3 runme.py --list-demos

runme.py is a convenience wrapper around the structured pipeline. It resolves the demo config under data/demo/<demo_name>/config.yml and runs the pipeline for you.

Run the pipeline from Python

The direct Python API is:

from pathlib import Path

from abacus.pipeline import PipelineRunConfig, run_pipeline

result = run_pipeline(
    PipelineRunConfig(
        config_path=Path("data/demo/geo_panel/config.yml"),
        output_dir=Path("results"),
        run_name="geo_panel_quickstart",
        prior_samples=10,
        draws=200,
        tune=200,
        chains=2,
        cores=2,
        random_seed=42,
        curve_samples=50,
        curve_points=50,
    )
)

print(result.run_dir)
print(result.manifest_path)

If the YAML config already contains data.dataset_path, you do not need to pass dataset_path again.

Run the thin CLI directly

The pipeline also exposes a thin CLI in abacus.pipeline.runner:

python3 -m abacus.pipeline.runner \
  --config data/demo/geo_panel/config.yml \
  --output-dir results \
  --run-name geo_panel_quickstart \
  --prior-samples 10 \
  --draws 200 \
  --tune 200 \
  --chains 2 \
  --cores 2 \
  --random-seed 42 \
  --curve-samples 50 \
  --curve-points 50

The CLI prints the final run directory when the pipeline completes.

Override data paths

Use one of these patterns:

Pattern Arguments
Combined dataset override dataset_path= in Python or --dataset-path in the CLI
Separate feature and target files x_path= and y_path= in Python or --x-path and --y-path in the CLI
Target column override target_column= in Python or --target-column in the CLI

Configured relative paths are resolved relative to the YAML config directory.

If you want Stage 50 to use different warn/fail cutoffs, add a runner-only diagnostics.thresholds block to the YAML. See YAML Configuration.

What you get back

run_pipeline(...) returns a PipelineRunResult with:

  • run_dir
  • manifest_path

The output directory contains stage folders such as:

  • 00_run_metadata
  • 20_model_fit
  • 30_model_assessment
  • 50_diagnostics
  • 60_response_curves

60_response_curves now includes three complementary curve families:

  • saturation-only transformation artefacts
  • forward-pass direct contribution artefacts built from scaled observed history
  • adstock carryover artefacts

When to use the runner

Choose the runner when you want:

  • a reproducible run directory on disk
  • structured metadata and manifest files
  • staged artefacts for diagnostics and reporting
  • a config-driven workflow for repeated runs

If you only need to fit a model interactively in a notebook or script, start with Quickstart: Python API or Quickstart: YAML Builder.