Panel Data Layout

This page explains how PanelMMM expects panel rows to be organised in X. For the column-level contract, see Input Data Requirements.

What “panel” means in Abacus

In Abacus, a panel dataset repeats the same time axis across one or more categorical dimensions in dims.

Each row represents:

one date_column value
one combination of dims values, if any
one set of channel and optional control values for that slice

With no extra panel dims, each date appears once. With dims=("geo",), each date appears once per geo. With dims=("geo", "brand"), each date appears once per geo + brand combination.

How `dims` work

Pass panel dimensions when you construct the model:

from abacus.mmm import GeometricAdstock, LogisticSaturation
from abacus.mmm.panel import PanelMMM

mmm = PanelMMM(
    date_column="date",
    channel_columns=["tv", "search"],
    target_column="sales",
    dims=("geo", "brand"),
    adstock=GeometricAdstock(l_max=8),
    saturation=LogisticSaturation(),
)

dims columns stay in X. They are not moved into y.

Abacus reserves these names for internal coordinates, so do not use them in dims:

date
channel
control
fourier_mode

No extra panel dims

If dims=(), X should have one row per date.

date	tv	search	sales
2025-01-06	120	40	820
2025-01-13	125	42	835
2025-01-20	130	45	850

Internally, Abacus reshapes this into:

channels: (date, channel)
target: (date,)
controls, if present: (date, control)

Single panel dim example: `geo`

If dims=("geo",), each date should appear once for each geo value.

date	geo	tv	search	sales
2025-01-06	UK	120	40	820
2025-01-06	US	150	55	910
2025-01-13	UK	125	42	835
2025-01-13	US	152	58	925

Internally, Abacus reshapes this into:

channels: (date, geo, channel)
target: (date, geo)
controls, if present: (date, geo, control)

Multiple panel dims example: `geo` and `brand`

If dims=("geo", "brand"), each row identifies one date, one geo, and one brand.

import pandas as pd

X = pd.DataFrame(
    {
        "date": pd.to_datetime(
            [
                "2025-01-06",
                "2025-01-06",
                "2025-01-06",
                "2025-01-06",
                "2025-01-13",
                "2025-01-13",
                "2025-01-13",
                "2025-01-13",
            ]
        ),
        "geo": ["UK", "UK", "US", "US", "UK", "UK", "US", "US"],
        "brand": ["A", "B", "A", "B", "A", "B", "A", "B"],
        "tv": [80.0, 55.0, 92.0, 60.0, 82.0, 58.0, 95.0, 63.0],
        "search": [20.0, 18.0, 24.0, 19.0, 21.0, 18.5, 25.0, 20.0],
    }
)

y = pd.Series(
    [510.0, 370.0, 590.0, 405.0, 520.0, 380.0, 605.0, 418.0],
    name="sales",
)

For a rectangular panel, the row count is:

n_dates * n_geo * n_brand

Internal reshape

Abacus converts the pandas inputs into xarray datasets before building the PyMC model.

Input role	Internal variable	xarray dims
`X[channel_columns]`	`_channel`	`(date, *dims, channel)`
`X[control_columns]`	`_control`	`(date, *dims, control)`
`y`	`_target`	`(date, *dims)`

The channel and control dimensions come from the configured column names, not from row values.

Rectangularity, duplicates, and missing rows

Abacus builds xarray coordinates from the unique values it sees in:

date_column
each configured dimension column
the configured channel or control names

That has three practical consequences:

Keep the panel rectangular. Provide one row for every expected date_column + dims combination.
Use explicit zeroes for structural no-spend or no-activity rows.
Keep declared channel, control, and target values observed within those rows. Abacus rejects missing metric cells instead of silently converting them to zeroes.
Do not use missing rows to mean “unknown”. Abacus validates panel shape before reshape and raises an error if panel cells are missing.

Abacus also requires each date_column + dims combination to appear exactly once. It does not aggregate duplicates for you. If you have duplicate rows, deduplicate or aggregate them before fitting or posterior prediction.

Sorting and uniqueness

Sort your data before fitting:

first by date_column
then by each entry in dims

Abacus keeps dates in the order they appear in X, and time-varying features infer time resolution from adjacent rows. A sorted dataset makes the time axis deterministic and easier to reason about.

Also make sure that each date_column + dims combination appears once in the prepared table, and that every expected panel slice is present for every date.

DataFrame versus MultiIndex handling

For normal fitting:

use a regular DataFrame for X
keep date_column and any dims as columns in that DataFrame
use a row-aligned Series for y

Abacus does have internal helpers that can align a MultiIndex target Series indexed by [date_column, *dims], but that is not the main user-facing data preparation pattern for fit().

Practical checklist

One row per date_column + dims combination
No duplicate rows for the same panel cell
Same set of dates for every panel slice
Explicit zeroes for true zero activity
No missing observed channel, control, or target values
Sorted rows before fitting

For scaling choices once the layout is correct, see Scaling and Preprocessing.