Panel Data Layout

This page explains how PanelMMM expects panel rows to be organised in X. For the column-level contract, see Input Data Requirements.

What “panel” means in Abacus

In Abacus, a panel dataset repeats the same time axis across one or more categorical dimensions in dims.

Each row represents:

  • one date_column value
  • one combination of dims values, if any
  • one set of channel and optional control values for that slice

With no extra panel dims, each date appears once. With dims=("geo",), each date appears once per geo. With dims=("geo", "brand"), each date appears once per geo + brand combination.

How dims work

Pass panel dimensions when you construct the model:

from abacus.mmm import GeometricAdstock, LogisticSaturation
from abacus.mmm.panel import PanelMMM

mmm = PanelMMM(
    date_column="date",
    channel_columns=["tv", "search"],
    target_column="sales",
    dims=("geo", "brand"),
    adstock=GeometricAdstock(l_max=8),
    saturation=LogisticSaturation(),
)

dims columns stay in X. They are not moved into y.

Abacus reserves these names for internal coordinates, so do not use them in dims:

  • date
  • channel
  • control
  • fourier_mode

No extra panel dims

If dims=(), X should have one row per date.

date tv search sales
2025-01-06 120 40 820
2025-01-13 125 42 835
2025-01-20 130 45 850

Internally, Abacus reshapes this into:

  • channels: (date, channel)
  • target: (date,)
  • controls, if present: (date, control)

Single panel dim example: geo

If dims=("geo",), each date should appear once for each geo value.

date geo tv search sales
2025-01-06 UK 120 40 820
2025-01-06 US 150 55 910
2025-01-13 UK 125 42 835
2025-01-13 US 152 58 925

Internally, Abacus reshapes this into:

  • channels: (date, geo, channel)
  • target: (date, geo)
  • controls, if present: (date, geo, control)

Multiple panel dims example: geo and brand

If dims=("geo", "brand"), each row identifies one date, one geo, and one brand.

import pandas as pd

X = pd.DataFrame(
    {
        "date": pd.to_datetime(
            [
                "2025-01-06",
                "2025-01-06",
                "2025-01-06",
                "2025-01-06",
                "2025-01-13",
                "2025-01-13",
                "2025-01-13",
                "2025-01-13",
            ]
        ),
        "geo": ["UK", "UK", "US", "US", "UK", "UK", "US", "US"],
        "brand": ["A", "B", "A", "B", "A", "B", "A", "B"],
        "tv": [80.0, 55.0, 92.0, 60.0, 82.0, 58.0, 95.0, 63.0],
        "search": [20.0, 18.0, 24.0, 19.0, 21.0, 18.5, 25.0, 20.0],
    }
)

y = pd.Series(
    [510.0, 370.0, 590.0, 405.0, 520.0, 380.0, 605.0, 418.0],
    name="sales",
)

For a rectangular panel, the row count is:

n_dates * n_geo * n_brand

Internal reshape

Abacus converts the pandas inputs into xarray datasets before building the PyMC model.

Input role Internal variable xarray dims
X[channel_columns] _channel (date, *dims, channel)
X[control_columns] _control (date, *dims, control)
y _target (date, *dims)

The channel and control dimensions come from the configured column names, not from row values.

Rectangularity, duplicates, and missing rows

Abacus builds xarray coordinates from the unique values it sees in:

  • date_column
  • each configured dimension column
  • the configured channel or control names

That has three practical consequences:

  • Keep the panel rectangular. Provide one row for every expected date_column + dims combination.
  • Use explicit zeroes for structural no-spend or no-activity rows.
  • Keep declared channel, control, and target values observed within those rows. Abacus rejects missing metric cells instead of silently converting them to zeroes.
  • Do not use missing rows to mean “unknown”. Abacus validates panel shape before reshape and raises an error if panel cells are missing.

Abacus also requires each date_column + dims combination to appear exactly once. It does not aggregate duplicates for you. If you have duplicate rows, deduplicate or aggregate them before fitting or posterior prediction.

Sorting and uniqueness

Sort your data before fitting:

  • first by date_column
  • then by each entry in dims

Abacus keeps dates in the order they appear in X, and time-varying features infer time resolution from adjacent rows. A sorted dataset makes the time axis deterministic and easier to reason about.

Also make sure that each date_column + dims combination appears once in the prepared table, and that every expected panel slice is present for every date.

DataFrame versus MultiIndex handling

For normal fitting:

  • use a regular DataFrame for X
  • keep date_column and any dims as columns in that DataFrame
  • use a row-aligned Series for y

Abacus does have internal helpers that can align a MultiIndex target Series indexed by [date_column, *dims], but that is not the main user-facing data preparation pattern for fit().

Practical checklist

  • One row per date_column + dims combination
  • No duplicate rows for the same panel cell
  • Same set of dates for every panel slice
  • Explicit zeroes for true zero activity
  • No missing observed channel, control, or target values
  • Sorted rows before fitting

For scaling choices once the layout is correct, see Scaling and Preprocessing.