Input Data Requirements
Use this page together with Panel Data Layout and
Scaling and Preprocessing when you prepare a
dataset for PanelMMM.
Core contract
For direct Python use, PanelMMM expects:
Xas apandas.DataFrameyas apandas.Seriesnamedtarget_column, or a one-dimensional NumPy array of the same length asX
X must contain the date column, all media columns, and any configured
control_columns or dims columns. y carries only the target values.
| Role | Where it must be present | Required | Notes |
|---|---|---|---|
date_column |
X |
Yes | Normalise to datetimes or parseable date strings. |
channel_columns |
X |
Yes | Every listed channel column must exist in X. |
target_column |
y |
Yes | y.name should match target_column. |
control_columns |
X |
No | If configured, every listed control column must exist in X. |
dims |
X |
No | One column per configured panel dimension, such as geo or brand. |
X and y
When you call fit(X, y) or build_model(X, y):
- Keep the target out of
X. - Keep
Xandyrow-aligned. - If both are pandas objects, keep the same index on both. The shared regression builder checks index equality before fitting.
- If you pass
yas a NumPy array, its length must matchlen(X). - For panel models, each
date_column+dimscombination must appear exactly once. Duplicate rows are rejected.
Abacus uses target_column as the target name throughout the panel reshape.
If y is a Series, its name must match target_column.
Date column
date_column is required in X.
Abacus expects calendar dates, not integer date codes. In practice:
- Use
datetime64[ns]where possible. - Parse string dates with
pd.to_datetime(...)before fitting when you use the Python API. - Do not rely on numeric date values such as
0, 1, 2. Pandas can interpret them as offsets from the Unix epoch, which is usually not what you want.
The YAML builder normalises X[date_column] with pd.to_datetime(...) after
loading the dataset. Direct Python use does not add an equivalent preprocessing
step for you.
Channel columns
channel_columns is a required constructor argument and must be a non-empty
list.
Each listed channel:
- must be present in
X - must be fully observed for every row you pass into fit or posterior prediction; Abacus does not silently convert missing channel values to zero
- should represent the raw media variable that you want the adstock and saturation transformations to consume
Target column
target_column names the dependent variable. It defaults to "y", but you can
set a different name such as "sales" or "conversions".
For direct Python use:
- pass the target as
y - name the
Serieswithtarget_column - keep the target fully observed; missing target values are rejected rather than zero-filled
For combined-file YAML or pipeline flows:
- keep the target column in the source dataset
- Abacus splits it out of the combined dataset before fitting
Control columns
control_columns is optional.
If you configure it, every listed control column must be present in X.
Controls stay in the design matrix as separate regressors; they are not part of
y.
Like channels, configured controls must be fully observed for every row passed into fit or posterior prediction.
Abacus does not automatically scale controls. See Scaling and Preprocessing.
Panel dimensions with dims
dims is optional. Use it when you want a panel model, for example by geo,
brand, or market.
If you set dims=("geo", "brand"):
Xmust containgeoandbrandcolumns- each row in
Xrepresents onedate+geo+brandobservation - each new date must include every fitted panel slice when you later call posterior-predictive methods with new data
Do not use reserved internal names in dims:
datechannelcontrolfourier_mode
For row layout and rectangularity guidance, see Panel Data Layout.
Supported shapes and alignment
| Workflow | Supported shape |
|---|---|
Direct PanelMMM.fit() / build_model() |
X: DataFrame; y: Series or 1D ndarray |
YAML builder with data.dataset_path |
One tabular file containing both predictors and the target column |
Pipeline runner with dataset_path |
Same as above |
Pipeline runner with x_path and y_path |
Separate feature and target files; the runner extracts target_column from the target file |
Abacus also has an internal alignment helper that can work with a MultiIndex
target Series indexed by [date_column, *dims], but that is mainly used in
fit-data rebuild and load flows. For normal fitting, keep y row-aligned with
X.
Python example
YAML note
If you use a combined dataset in YAML, the file at data.dataset_path must
contain every configured column:
date_column- every entry in
channel_columns - every entry in
control_columns, if any - every entry in
dims, if any target_column
Example:
Common pitfalls
- Missing
date_column, channel, control, or dimension columns inX - Passing a
ySerieswhosenamedoes not matchtarget_column - Passing pandas
Xandywith different indexes - Passing a NumPy
ywith a different length fromX - Passing duplicate panel rows or incomplete panel slices for a given date
- Passing missing observed channel, control, or target values and expecting Abacus to treat them as structural zeroes
- Expecting the YAML builder or pipeline to find a target column that is not present in the combined dataset
- Leaving date values as numeric codes instead of normalising them first