Scaling and Preprocessing

Abacus scales channels and the target automatically before it builds the PyMC graph for PanelMMM. This page explains what is scaled, how the Scaling configuration works, and what you still need to preprocess yourself.

What Abacus scales automatically

Abacus computes scales from the reshaped xarray dataset immediately before model construction.

Variable role	Automatic scaling	Notes
Target (`y`)	Yes	Divided by `target_scale` before the likelihood is built.
Channels (`channel_columns`)	Yes	Divided by `channel_scale` before adstock and saturation.
Controls (`control_columns`)	No	Controls enter the model on their original scale.
Date and `dims` columns	No	These define coordinates, not modelled numeric inputs.

Abacus stores the resulting scalers in the model as xarray data:

_target scaler data in model.scalers["_target"]
_channel scaler data in model.scalers["_channel"]

Default behaviour

If you do not pass scaling, PanelMMM uses:

Scaling(
    target=VariableScaling(method="max", dims=dims),
    channel=VariableScaling(method="max", dims=dims),
)

This means:

the target is divided by the maximum over date and all configured dims
each channel is divided by its maximum over date and all configured dims

With no extra panel dims:

target_scale is a scalar
channel_scale has dimension channel

With dims=("geo",) and the default scaling:

target_scale is still a scalar, because scaling reduces over both date and geo
channel_scale still has dimension channel, so each channel is pooled across all geos

If you want per-panel scales instead of pooled scales, set dims=() inside the relevant VariableScaling. See Dimension semantics.

`Scaling` and `VariableScaling`

Use abacus.mmm.scaling.Scaling and abacus.mmm.scaling.VariableScaling to control automatic scaling.

Setting	Purpose	Allowed values
`VariableScaling.method`	Reduction used to compute the scale	`"max"` or `"mean"`
`VariableScaling.dims`	Extra dimensions to reduce across, in addition to `date`	String or tuple of strings
`Scaling.target`	Scaling rule for the target	`VariableScaling`
`Scaling.channel`	Scaling rule for channels	`VariableScaling`

Rules enforced by the implementation:

date is always assumed in the reduction and must not be listed in VariableScaling.dims.
Duplicate scaling dims are not allowed.
Target scaling dims must come from the model dims.
Channel scaling dims must come from the model dims, with optional inclusion of channel.

You can pass either:

a Scaling object
a plain dictionary with target and channel keys

If the dictionary omits one side, Abacus fills the missing target or channel rule with the default method="max", dims=dims configuration.

Dimension semantics

VariableScaling.dims tells Abacus which dimensions to reduce across in addition to date. It does not tell Abacus which dimensions to keep.

Assume a model with dims=("geo",) so channel data has dimensions (date, geo, channel) and target data has dimensions (date, geo).

Configuration	Reduction performed	Resulting scale dims	Meaning
`target.dims=()`	over `date`	`(geo,)`	One target scale per geo
`target.dims=("geo",)`	over `date`, `geo`	`()`	One pooled target scale
`channel.dims=()`	over `date`	`(geo, channel)`	One scale per geo-channel pair
`channel.dims=("geo",)`	over `date`, `geo`	`(channel,)`	One pooled scale per channel
`channel.dims=("geo", "channel")`	over `date`, `geo`, `channel`	`()`	One pooled scale for all channels

Python example

This example keeps separate scales for each geo by reducing only over date:

from abacus.mmm import GeometricAdstock, LogisticSaturation
from abacus.mmm.panel import PanelMMM
from abacus.mmm.scaling import Scaling, VariableScaling

mmm = PanelMMM(
    date_column="date",
    channel_columns=["tv", "search"],
    target_column="sales",
    dims=("geo",),
    scaling=Scaling(
        target=VariableScaling(method="mean", dims=()),
        channel=VariableScaling(method="max", dims=()),
    ),
    adstock=GeometricAdstock(l_max=8),
    saturation=LogisticSaturation(),
)

In that configuration:

the target is divided by the per-geo mean over time
each channel is divided by the per-geo, per-channel maximum over time

YAML example

The YAML builder accepts the same structure through a top-level scaling block:

data:
  date_column: date

target:
  column: y
  type: revenue

dimensions:
  panel: [market]

media:
  channels: [channel_1, channel_2]
  adstock:
    type: geometric
    l_max: 8
  saturation:
    type: logistic

scaling:
  target:
    method: max
    dims: []
  channels:
    method: max
    dims: [market]

In this example:

target is scaled separately for each market
channel is scaled across date and market, leaving one scale per channel

Original units versus model scale

The model is fit on scaled target and channel data.

That affects downstream interpretation:

posterior likelihood and many contribution variables live in scaled target space
channel inputs are transformed after scaling, not in raw units

If you want stored deterministics in original target units, add them explicitly after build_model(...):

mmm.add_original_scale_contribution_variable(
    var=["channel_contribution", "y"]
)

The YAML builder supports the same workflow through original_scale_vars:

original_scale_vars:
  - channel_contribution
  - y

original_scale_vars adds extra original-scale deterministic variables. It does not change how the model is fit.

What Abacus does not preprocess for you

Abacus does not automatically:

scale controls
impute missing data in a domain-aware way
reinterpret missing observed channel, control, or target values as zeroes
sort the dataset for you
repair non-rectangular panel layouts
tolerate duplicate panel rows or incomplete panel slices
coerce Python-API dates to datetimes before fitting

Practical preprocessing advice

Before fitting:

normalise date_column with pd.to_datetime(...)
sort by date_column and then by dims
make panel gaps explicit instead of leaving missing rows
ensure every date_column + dims panel cell appears exactly once
impute missing observed channel, control, and target values before fitting or posterior prediction instead of relying on implicit zero-fill
decide whether controls should be centred, standardised, log-transformed, or otherwise prepared before they go into control_columns
choose scaling dims deliberately instead of relying on the default when you use panel data

Common pitfalls

Expecting the default scaling to be per-group when it actually pools across the configured panel dims
Adding date to VariableScaling.dims; Abacus rejects this
Forgetting that controls are left on their original scale
Treating VariableScaling.dims as dimensions to keep rather than dimensions to reduce across
Assuming original_scale_vars changes fitting scale rather than adding extra outputs

For the input table shape that scaling operates on, see Panel Data Layout.