HSGP

This document answers common questions econometricians may have when encountering HSGP (Hilbert Space Gaussian Process) approximations in the codebase, particularly regarding model flexibility and the number of basis functions.

1. Does a Hilbert Space Gaussian Process use up degrees of freedom when modelling?

Yes, but not in the strict $N - k$ counting sense used in classical OLS econometrics. Instead, Gaussian Processes (and their HSGP approximations) use “effective degrees of freedom” (EDF) due to Bayesian regularization.

Here is how to map HSGPs to classical econometrics concepts:

The Mechanical Setup (Looks like it uses $m$ degrees of freedom)

In classical econometrics, if you want to model a non-linear time trend, you might add polynomial terms or a Fourier series (sines and cosines). If you add $m$ sine/cosine terms to your OLS model, you lose exactly $m$ degrees of freedom.

An HSGP is mathematically very similar to a Fourier series. It approximates an infinite-dimensional Gaussian Process by using $m$ basis functions (the m parameter in the code, often set to 50–200). If this were OLS, estimating those 200 basis function coefficients would cost 200 degrees of freedom, potentially breaking your model if $N < 200$.

The Bayesian Reality (Effective Degrees of Freedom)

In an HSGP, those $m$ coefficients are not freely estimated. They are bound together by a hierarchical prior structure governed by hyperparameters, specifically the lengthscale ($\ell$) and the amplitude/variance ($\eta$).

Because the coefficients share a prior that heavily shrinks most of them toward zero, we measure the flexibility using Effective Degrees of Freedom (EDF).

  • Like Ridge Regression: Think of HSGP as running a Ridge Regression (L2 regularization) on 200 Fourier terms. Even though there are 200 parameters, the L2 penalty restricts their variance. The “effective” degrees of freedom might only be 4 or 5.
  • Data-driven penalty: The amount of shrinkage is controlled by the lengthscale ($\ell$).
    • If the data shows a smooth, slowly moving trend, the model learns a large lengthscale. This imposes massive shrinkage on the high-frequency (wiggly) basis functions, meaning the HSGP uses very few effective degrees of freedom (acting almost like a simple linear trend).
    • If the data is highly volatile, the model learns a short lengthscale, relaxing the shrinkage, allowing the curve to wiggle, and consuming more effective degrees of freedom.

Summary: While you might instantiate an HSGP with 100 basis functions ($m=100$), it does not subtract 100 from your denominator. It dynamically consumes exactly as much “effective” flexibility as the data proves is necessary, heavily penalizing unnecessary complexity (wiggliness) via its priors. You are completely safe from the classical $N - k < 0$ matrix inversion failures.


2. Is it up to the analyst to decide how many basis functions to set? Will this result in specification hunting?

This is a very valid concern. In standard OLS, if Analyst A uses a 5th-order Fourier series and Analyst B uses a 20th-order Fourier series, they will get wildly different results, opening the door for specification hunting.

In the Abacus HSGP implementation, this risk is mitigated in two ways: Automated Heuristics (code design) and Approximation Limits (mathematical design).

1. Automated Selection (The Code Design)

The library is specifically designed so analysts do not have to guess or manually set the number of basis functions ($m$).

In the HSGP class, the factory method parameterize_from_data calculates $m$ automatically using an algorithm (approx_hsgp_hyperparams) based on published literature (Ruitort-Mayol et al., 2022).

It calculates $m$ deterministically based on two things:

  1. The span of the time-series data (e.g., 3 years of weekly data).
  2. The lower bound of the lengthscale prior (the shortest time-span over which we believe the effect could realistically change).

This guarantees that two analysts modeling the same dataset with the same assumptions will end up with the exact same $m$.

2. $m$ dictates “Resolution”, not “Complexity” (The Mathematical Design)

Even if an analyst decided to bypass the automation and manually force a massive number of basis functions, it would not result in overfitting or specification hunting.

In an HSGP, $m$ is just the resolution limit of the approximation to the true infinite-dimensional Gaussian Process.

  • If $m$ is too small: The model lacks the resolution to capture fast-moving trends (it will artificially smooth things out).
  • If $m$ is exactly right (e.g., $m=50$): The model perfectly approximates the true Gaussian Process.
  • If $m$ is absurdly large (e.g., $m=500$): The model will yield the exact same curve as $m=50$.

Why? Because the extra 450 basis functions represent very high-frequency, rapid wiggles. The Bayesian lengthscale prior mathematically forces the coefficients for those extra high-frequency basis functions exactly to zero.

The only penalty for setting $m$ too high is computation time. The MCMC sampler will run much slower because it has to drag around useless matrices, but the statistical fit will remain identical. Therefore, an analyst cannot “p-hack” or specification-hunt by artificially inflating $m$.

3. We often model trend/seasonality using explicit Fourier terms (e.g., sin52_1 + cos52_1 + ...). This uses up degrees of freedom and often causes severe multicollinearity (high VIF) with our media or control variables. Does HSGP solve this?

Yes. Explicitly adding Fourier terms to a linear formula creates structural problems that HSGP elegantly sidesteps.

1. The Degrees of Freedom Problem

As discussed in Section 1, explicitly adding 10 sine/cosine terms to a regression permanently burns 10 degrees of freedom. The model is forced to independently estimate an unpenalized coefficient for every single wave, regardless of whether that specific frequency is actually present in the data.

The HSGP Solution: HSGP uses Effective Degrees of Freedom (EDF). It evaluates a large number of basis functions (which are essentially Fourier terms), but ties them all together under a single hierarchical Gaussian Process prior. If the data doesn’t exhibit a certain high-frequency wiggle, the GP lengthscale prior dynamically crushes the coefficients of those specific basis functions toward zero. You get the flexibility of 100 sine waves, but only “pay” for the effective degrees of freedom the data actually demands.

2. The Multicollinearity (High VIF) Problem

When you add explicit Fourier terms, they act as independent regressors. If one of your media channels (e.g., m_tv) happens to have a seasonal spending pattern that correlates strongly with sin52_1, the model suffers from classic multicollinearity. The VIF skyrockets, standard errors blow up, and the coefficient for m_tv becomes completely unstable (the “backdoor” bias).

The HSGP Solution: HSGP mitigates this through structured regularization.

  • Orthogonal Basis: The basis functions generated internally by the HSGP are orthogonal to each other.
  • Shared Shrinkage: More importantly, the coefficients for the HSGP basis functions are not estimated independently. They are strictly regularized by the GP’s lengthscale ($\ell$) and variance ($\eta$) hyperparameters. Because the GP is mathematically constrained to behave like a smooth, cohesive curve, it cannot arbitrarily spike a single basis function’s coefficient just to “steal” variance from a highly correlated m_tv variable. The GP prior strongly penalizes such isolated, un-smooth coefficient spikes. Consequently, the model focuses on capturing the true underlying baseline trend, leaving the media coefficients much more stable than they would be against unpenalized, explicit Fourier regressors.

4. Should we feed in holiday dummy variables instead?

No. You do not need to manually construct binary 1/0 dummy variables (e.g., is_black_friday) or step functions in your input data.

The recommendation is to pass the raw dates of the holidays directly into the model via a separate DataFrame. Abacus’s EventAdditiveEffect API will internally calculate the distance in days from your time series to the holiday, and wrap that in a continuous basis function (like a Gaussian curve). This provides a smoother, more realistic “build-up and cool-down” effect compared to the harsh structural breaks of traditional dummy variables.

Example: Ingesting a Holidays DataFrame into Abacus

If you have a CSV of holidays (like data-config/holidays.csv), you load it as a standard Pandas DataFrame and inject it into the model configuration before building.

import pandas as pd
from pymc_extras.prior import Prior
from abacus.mmm.panel import PanelMMM
from abacus.mmm.events import EventEffect, GaussianBasis

# 1. Load your raw holidays
# The dataframe must contain exactly: "name", "start_date", "end_date"
df_holidays = pd.DataFrame({
    "name": ["Black Friday 2023", "Black Friday 2024", "Christmas 2023"],
    "start_date": ["2023-11-24", "2024-11-29", "2023-12-25"],
    "end_date": ["2023-11-25", "2024-11-30", "2023-12-26"]
})

# 2. Define the mathematical shape of the holiday effect
# We use a GaussianBasis so the effect smoothly ramps up and down
holiday_effect = EventEffect(
    basis=GaussianBasis(),
    effect_size=Prior("Normal", mu=0, sigma=1),
    dims="holiday"
)

# 3. Initialize your MMM
mmm = PanelMMM(
    date_column="date",
    target_column="sales",
    channel_columns=["tv", "social"],
    dims=("country",)
)

# 4. Inject the raw dataframe into the API
# Abacus handles all the distance calculations and basis mappings internally
mmm.add_events(
    df_events=df_holidays,
    prefix="holiday",
    effect=holiday_effect
)

# 5. Build and fit as normal
mmm.build_model(X, y)
mmm.fit()

5. If HSGP is statistically superior for seasonality, why does the fourier.py module still exist?

This is not a contradiction. Model building requires balancing statistical elegance with computational constraints and structural assumptions. There are four reasons explicit Fourier terms are retained alongside HSGP in the library:

1. Computation Speed

HSGPs are statistically efficient but computationally expensive. The PyMC engine must invert and multiply large matrices to solve the Gaussian Process approximation. Explicit Fourier terms, by contrast, are just static columns in the design matrix. Estimating a Bayesian regression with 4 sine/cosine columns takes seconds; fitting an HSGPPeriodic can be substantially slower. For analysts iterating rapidly on a prototype or running models on large datasets, explicit Fourier terms offer a fast, “good enough” approximation.

2. Static vs. Drifting Seasonality

  • HSGPPeriodic allows the seasonal shape to drift slowly over time (e.g., consumer behaviour shifting gradually across 5 years). This is more realistic but requires learning extra GP hyperparameters.
  • Explicit Fourier forces the seasonality to be completely static: the December peak in 2021 is mathematically identical to the December peak in 2024. If the econometrician has a strong prior belief that the seasonal structure is structurally invariant, explicit Fourier terms enforce that belief more rigidly than an HSGP can.

3. The “Trend = HSGP, Seasonality = Fourier” Hybrid Pattern

A very common and practically effective architecture in Bayesian MMMs is:

  • Standard HSGP for the baseline trend, because trend is unbound, unpredictable, and highly prone to overfitting.
  • A low-order YearlyFourier (e.g., n_order=2 or 3) for seasonality, because seasonality is bounded, predictable, and structurally repetitive.

By keeping the Fourier order very low, the degrees of freedom penalty is minimal (only 4–6 parameters), and the analyst avoids the computational overhead of running two separate HSGPs simultaneously. This hybrid is often the most practical choice for weekly marketing data.

4. Backwards Compatibility and Migration

Many teams migrate to Abacus from legacy OLS frameworks or tools like Prophet, which relies heavily on explicit Fourier terms. To build trust in the new Bayesian framework, econometricians often want to first build a “baseline” model that perfectly mirrors their old model’s architecture and verify they obtain comparable results. The fourier.py module enables this 1:1 apples-to-apples comparison before upgrading the architecture to use HSGPs.