Posterior Predictive Checks
Posterior predictive checking asks a simple question:
After fitting the model, can it reproduce the main features of the observed data?
For a classically trained econometrician, this is the Bayesian analogue of residual diagnostics, fitted-versus-observed checks, and out-of-sample sanity-checking, but with one important difference: the checks are based on the full posterior distribution, not a single point estimate.
1. What the check actually is
After fitting, you sample from the posterior predictive distribution:
Conceptually, each posterior draw says:
- here is one plausible parameter vector
- given that parameter vector, here is one plausible target path
If the fitted model is adequate, the observed data should look like a credible member of that posterior predictive family.
2. Why this matters
A model can have:
- clean MCMC diagnostics
- seemingly sensible coefficient signs
- elegant priors
and still fail to reproduce basic features of the target series.
Posterior predictive checks catch that mismatch.
This matters because a model that cannot reproduce the observed target well enough is usually not ready for:
- decomposition narratives
- ROI or CPA interpretation
- budget optimisation
- strong causal storytelling
3. How Abacus supports it
Abacus exposes posterior predictive sampling directly:
It also exposes retained plotting helpers such as:
In the structured runner, Stage 30 assessment writes a fuller set of artefacts:
30_model_assessment/posterior_predictive.nc30_model_assessment/posterior_predictive.png30_model_assessment/posterior_predictive_summary.csv30_model_assessment/observed.csv30_model_assessment/fitted.csv30_model_assessment/fit_timeseries.png30_model_assessment/fit_scatter.png30_model_assessment/residuals.csv30_model_assessment/residuals_timeseries.png30_model_assessment/residuals_hist.png30_model_assessment/residuals_vs_fitted.png
That assessment stage is the closest Abacus comes to a retained, systematically-produced posterior predictive diagnostics bundle.
4. What to inspect
Observed versus fitted over time
Start with the time-series overlay.
Ask:
- Does the fitted mean track the major movements in the target?
- Are the predictive intervals wide enough to cover the observed series reasonably often?
- Does the model systematically lag turning points or seasonal peaks?
If the observed line keeps sitting outside the predictive interval in structured ways, the model is missing something systematic rather than merely being noisy.
Residual structure
Residuals should not show strong unresolved patterns.
In practice, look for:
- long runs of positive residuals followed by long runs of negative residuals
- clear seasonality left in the residuals
- residual variance increasing with fitted values
- one panel slice fitting much worse than the others
The presence of structure in the residuals usually means the model is still under-specified for the data.
Scatter of fitted versus observed
The fitted-versus-observed scatter is not a formal test, but it quickly shows:
- compression toward the mean
- systematic underprediction at high values
- systematic overprediction at low values
This is the Bayesian cousin of the fitted-value plots you would inspect after a classical regression.
5. What “good” posterior predictive behaviour looks like
A good posterior predictive check does not mean the model matches every wiggle exactly.
You are looking for something more practical:
- the main level and variation are captured
- the observed series falls inside plausible predictive ranges often enough
- residuals are not strongly structured
- panel slices are not failing in obviously asymmetric ways
The question is whether the model is adequate for interpretation, not whether it is perfect.
6. What posterior predictive checks cannot prove
This is the most important warning.
A model can pass posterior predictive checks and still fail as a causal model.
Why? Because posterior predictive checks evaluate prediction of the target, not causal attribution of the components.
Two models can predict sales equally well while assigning very different shares of those sales to:
- baseline
- media
- controls
- seasonality
- events
That is why posterior predictive checking must be paired with:
- Causal Identification in Marketing Mix Modelling
- Baseline vs Media Trade-Offs in MMM
- Model Comparison for Econometricians
7. Common failure patterns
The model is too rigid
If the fitted line misses broad movements or regime changes, the model may need more structural flexibility, for example in trend, seasonality, controls, or events.
The model is too flexible in the wrong place
You may see good in-sample fit but strange residual behaviour or unstable attribution because the model is fitting noise through components that should remain more constrained.
Media is carrying baseline structure
If media spend is strongly correlated with time patterns, the model may let media soak up baseline variation that should have been handled by intercept, seasonality, controls, or other additive structure.
Baseline is carrying media structure
The reverse can also happen: a very flexible baseline can absorb variation that you would otherwise attribute to media.
8. What to do when checks fail
If posterior predictive checks look bad, resist the temptation to jump straight to interpreting coefficients anyway.
Instead:
- Check convergence first.
- Inspect residual structure rather than only aggregate fit.
- Revisit baseline specification, controls, seasonality, events, and media transformation choices.
- Refit and compare again.
In other words, use posterior predictive checking as a model-development tool, not just as a reporting plot.
9. Practical recommendation
In Abacus, the robust sequence is:
- Run prior predictive checks before fitting.
- Fit the model and verify MCMC diagnostics.
- Run posterior predictive checks and inspect residuals.
- Only then move to contributions, optimisation, or causal interpretation.
That order mirrors how a careful econometrician would already work, except that the Bayesian workflow makes the predictive-check step much richer and more honest about uncertainty.