Causal Identification

If you are a classically trained econometrician, you have every right to be sceptical of Marketing Mix Models. The causal identification strategy underpinning MMM is weaker than the methods you were taught to trust. This document confronts that reality head-on: we explain what MMM can and cannot claim causally, where the identifying assumptions break down, and how modern calibration techniques partially rescue the framework. We also place MMM on the “causal ladder” relative to the gold-standard methods you already know.

Our goal is not to oversell MMM. It is to give you an honest accounting of the trade-offs, so you can deploy the tool where it is defensible and flag where it is not.

1. The Identification Problem, Plainly Stated

Every causal claim rests on an identification strategy — a logical argument for why the estimated relationship reflects a true causal effect rather than a statistical artefact. In classical econometrics, you learned several strategies, each with a well-understood set of assumptions. Consider three that you know well.

A randomised controlled trial (RCT) identifies a causal effect by physically randomising treatment assignment. Because randomisation breaks the link between treatment and all confounders (observed and unobserved), the simple difference in means is an unbiased estimator of the average treatment effect. The assumption is minimal: the randomisation was executed correctly.

An instrumental variables (IV/2SLS) estimator identifies a causal effect by exploiting an instrument — a variable that affects the outcome only through the endogenous treatment. The identifying assumptions are relevance (the instrument predicts the treatment) and the exclusion restriction (the instrument has no direct effect on the outcome). These assumptions are testable to some degree and falsifiable.

A difference-in-differences (DiD) estimator identifies a causal effect by comparing the change in outcomes over time between a treated and control group. The identifying assumption is parallel trends: absent treatment, the two groups would have followed the same trajectory. Again, this assumption is partially testable using pre-treatment data.

Now consider what MMM does. An MMM estimates media effects by regressing sales (or another KPI) on media spend and controls over time. The variation it exploits is temporal: weeks when TV spend was high are compared to weeks when TV spend was low, after controlling for seasonality, trend, and other observables.

The identifying assumption is strict exogeneity of the media regressors, conditional on the controls. In plain language: after we account for trend, seasonality, holidays, and any included control variables, the remaining variation in media spend is “as good as random” with respect to the error term. If an unobserved, time-varying confounder drives both media spend and sales simultaneously — and we have not controlled for it — the media coefficient is biased.

This is a strong assumption. And unlike the IV exclusion restriction or the DiD parallel trends assumption, it is essentially untestable. You cannot run a placebo check on an unobserved confounder you have not measured.

2. Where the Assumptions Break Down

The strict exogeneity assumption fails in practice more often than MMM practitioners care to admit. Consider three common violations.

The first is simultaneity. Media planners increase spend during periods when they expect sales to be high (Christmas, product launches, promotional windows). Sales are high in those periods not because of the advertising but because of the underlying demand shock. The MMM attributes the demand shock to the media channel, inflating its estimated effect. This is textbook endogeneity, identical to the problem that motivates IV estimation in labour economics or IO.

The second is omitted variable bias from time-varying confounders. Suppose a competitor launches an aggressive pricing campaign in Q3, simultaneously causing your sales to drop and your marketing team to increase defensive spend. The MMM sees high spend coinciding with low sales and may underestimate the media effect. If instead the competitor withdraws, the reverse happens. Without a “competitor activity” control, the media coefficient absorbs the confounding variation.

The third is functional form misspecification. Even if the true data-generating process satisfies strict exogeneity, specifying the wrong functional form (linear when the truth is concave, or missing an interaction between channels) introduces bias. MMM frameworks like Abacus mitigate this with flexible non-linear transforms (adstock, saturation), but no parametric family can guarantee correct specification.

3. How Lift Test Calibration Partially Rescues MMM

Modern Bayesian MMM frameworks, including Abacus, address the endogeneity problem through calibration with incrementality experiments (lift tests or geo-experiments). The logic works as follows.

A lift test is a controlled experiment — typically a geo-randomised or matched-market design — in which media exposure is deliberately varied across treatment and control regions. Because the variation is experimentally induced, the resulting incremental estimate is causally identified in the RCT sense, at least for the specific channel, time window, and geography tested.

When you feed this lift test estimate into the MMM (via the EventAdditiveEffect or lift test calibration API in Abacus), you inject an external piece of causal evidence into the model’s likelihood. The Bayesian machinery then updates the media coefficient posterior to be consistent with both the observational time-series data and the experimental result. In effect, the lift test acts as an anchor: it constrains the media coefficient to a causally credible region, even if the observational data alone would have produced a biased estimate.

Think of the lift test as playing a role analogous to an instrumental variable. The IV provides exogenous variation that identifies the causal effect. The lift test provides exogenous variation (from the experiment) that calibrates the observational estimate. The difference is that the IV is embedded inside the estimator, whereas the lift test enters as an informative prior or likelihood penalty.

This approach does not eliminate all bias. The lift test identifies the causal effect for one channel in one time window. Extrapolating that result across all channels and all time periods requires additional assumptions (stability of the effect over time, no interaction between the calibrated and uncalibrated channels). But it is a genuine improvement over pure observational MMM, and it brings the framework closer to the causal credibility that econometricians demand.

4. MMM on the Causal Ladder

We can place MMM relative to the methods you trust by thinking about a hierarchy of identification strategies, ordered by the strength of their causal assumptions.

At the top sits the RCT. Randomisation eliminates all confounding, and the only threat to validity is implementation failure (non-compliance, attrition, spillovers). For media measurement, the RCT analogue is a well-executed geo-experiment or a randomised holdout test. When you can run one, run one.

One rung below sits IV/2SLS. The instrument provides exogenous variation, but only if the exclusion restriction holds. In media measurement, genuine instruments are rare. Weather shocks that affect outdoor advertising exposure, or regulatory changes that force abrupt spend shifts, occasionally qualify. But most marketing datasets lack a credible instrument.

Below IV sits DiD and synthetic control methods. These exploit a treatment event (a campaign launch, a market entry) and compare treated versus control units under a parallel trends assumption. Geo-experiments with a staggered rollout fit naturally into this framework. The assumption is testable but not guaranteed.

Below DiD sits regression discontinuity (RD), which exploits a sharp threshold in treatment assignment. Media applications are uncommon because advertising spend rarely exhibits the kind of sharp discontinuity that RD requires.

And then we arrive at the observational regression — which is where standard MMM lives. The identifying assumptions are the weakest in the hierarchy: conditional exogeneity given controls, correct functional form, and no unobserved time-varying confounders. Without external calibration, this is the least credible causal claim on the ladder.

However, MMM calibrated with lift tests occupies a hybrid position. The observational regression provides the structure and the time-series variation. The lift test provides a causally identified anchor point. Together, they produce an estimate that is stronger than pure observational regression but weaker than a full RCT across all channels. In practice, this hybrid is the best that most marketing organisations can achieve at scale, because running a separate RCT for every channel, every quarter, in every market, is prohibitively expensive.

5. The Role of DAGs and Structural Thinking

If you are trained in the Pearlian causal inference tradition (directed acyclic graphs, do-calculus, the structural causal model), you will recognise that MMM implicitly assumes a particular DAG. The assumed structure looks roughly like this: media spend causes sales, seasonality and trend cause sales, controls cause sales, and (critically) nothing unobserved simultaneously causes both media spend and sales after conditioning on the included controls.

Drawing this DAG explicitly is a powerful exercise. It forces you to articulate every backdoor path between media and sales, and to verify that your control set blocks them all. If you identify a backdoor path that your controls do not block — for example, “competitor pricing → our media spend” and “competitor pricing → our sales” — you have found a source of bias that the MMM cannot resolve without either adding a control for competitor pricing or calibrating with a lift test.

We strongly recommend that every MMM engagement begins with a causal DAG workshop, even an informal one. The DAG does not make the model causal. But it forces the team to be explicit about what they are assuming, and it provides a framework for discussing where the model’s causal claims are credible and where they are not.

6. Honest Counsel for Sceptical Econometricians

We close with five points of honest counsel.

First, do not treat MMM outputs as causal estimates with the same confidence you would place in a well-identified IV or DiD result. They are not. They are conditional associations, regularised by Bayesian priors and (ideally) anchored by experimental calibration.

Second, always ask: “What is the identifying variation?” If the answer is “weeks when spend was high versus weeks when spend was low,” follow up with: “Why was spend high in those weeks? Could the same factor that drove high spend also have driven high sales independently?” If the answer is “yes” or “maybe,” the estimate is potentially confounded.

Third, calibrate wherever possible. A single well-executed lift test for your largest channel does more for the credibility of the entire model than any amount of prior tuning or functional form experimentation.

Fourth, use the model for what it does well. MMM excels at relative channel comparison (channel A versus channel B), at budget allocation (given a fixed total budget, how should we distribute it?), and at scenario planning (what happens if we increase TV spend by 20%?). These tasks require correct ranking of media effects, not unbiased point estimation. Even a moderately biased MMM can rank channels correctly if the bias is roughly proportional across channels.

Fifth, be transparent with stakeholders. Present posterior credible intervals, not point estimates. Discuss the assumptions openly. Flag where calibration data exists and where it does not. The credibility of the framework depends not on pretending the model is an RCT, but on demonstrating that the team understands its limitations and has taken concrete steps to mitigate them.