Panel Data Layout
This page explains how PanelMMM expects panel rows to be organised in X.
For the column-level contract, see
Input Data Requirements.
What “panel” means in Abacus
In Abacus, a panel dataset repeats the same time axis across one or more
categorical dimensions in dims.
Each row represents:
- one
date_columnvalue - one combination of
dimsvalues, if any - one set of channel and optional control values for that slice
With no extra panel dims, each date appears once. With dims=("geo",), each
date appears once per geo. With dims=("geo", "brand"), each date appears
once per geo + brand combination.
How dims work
Pass panel dimensions when you construct the model:
dims columns stay in X. They are not moved into y.
Abacus reserves these names for internal coordinates, so do not use them in
dims:
datechannelcontrolfourier_mode
No extra panel dims
If dims=(), X should have one row per date.
| date | tv | search | sales |
|---|---|---|---|
| 2025-01-06 | 120 | 40 | 820 |
| 2025-01-13 | 125 | 42 | 835 |
| 2025-01-20 | 130 | 45 | 850 |
Internally, Abacus reshapes this into:
- channels:
(date, channel) - target:
(date,) - controls, if present:
(date, control)
Single panel dim example: geo
If dims=("geo",), each date should appear once for each geo value.
| date | geo | tv | search | sales |
|---|---|---|---|---|
| 2025-01-06 | UK | 120 | 40 | 820 |
| 2025-01-06 | US | 150 | 55 | 910 |
| 2025-01-13 | UK | 125 | 42 | 835 |
| 2025-01-13 | US | 152 | 58 | 925 |
Internally, Abacus reshapes this into:
- channels:
(date, geo, channel) - target:
(date, geo) - controls, if present:
(date, geo, control)
Multiple panel dims example: geo and brand
If dims=("geo", "brand"), each row identifies one date, one geo, and one
brand.
For a rectangular panel, the row count is:
n_dates * n_geo * n_brand
Internal reshape
Abacus converts the pandas inputs into xarray datasets before building the PyMC model.
| Input role | Internal variable | xarray dims |
|---|---|---|
X[channel_columns] |
_channel |
(date, *dims, channel) |
X[control_columns] |
_control |
(date, *dims, control) |
y |
_target |
(date, *dims) |
The channel and control dimensions come from the configured column names,
not from row values.
Rectangularity, duplicates, and missing rows
Abacus builds xarray coordinates from the unique values it sees in:
date_column- each configured dimension column
- the configured channel or control names
That has three practical consequences:
- Keep the panel rectangular. Provide one row for every expected
date_column+dimscombination. - Use explicit zeroes for structural no-spend or no-activity rows.
- Keep declared channel, control, and target values observed within those rows. Abacus rejects missing metric cells instead of silently converting them to zeroes.
- Do not use missing rows to mean “unknown”. Abacus validates panel shape before reshape and raises an error if panel cells are missing.
Abacus also requires each date_column + dims combination to appear exactly
once. It does not aggregate duplicates for you. If you have duplicate rows,
deduplicate or aggregate them before fitting or posterior prediction.
Sorting and uniqueness
Sort your data before fitting:
- first by
date_column - then by each entry in
dims
Abacus keeps dates in the order they appear in X, and time-varying features
infer time resolution from adjacent rows. A sorted dataset makes the time axis
deterministic and easier to reason about.
Also make sure that each date_column + dims combination appears once in the
prepared table, and that every expected panel slice is present for every date.
DataFrame versus MultiIndex handling
For normal fitting:
- use a regular
DataFrameforX - keep
date_columnand anydimsas columns in thatDataFrame - use a row-aligned
Seriesfory
Abacus does have internal helpers that can align a MultiIndex target Series
indexed by [date_column, *dims], but that is not the main user-facing data
preparation pattern for fit().
Practical checklist
- One row per
date_column+dimscombination - No duplicate rows for the same panel cell
- Same set of dates for every panel slice
- Explicit zeroes for true zero activity
- No missing observed channel, control, or target values
- Sorted rows before fitting
For scaling choices once the layout is correct, see Scaling and Preprocessing.