Skip to main content

Bayes Linear Regression

Bayes Linear Regression is a probabilistic approach that combines Bayes' Theorem with linear regression. Instead of providing fixed point estimates for the model parameters (such as the coefficients in linear regression), this method incorporates uncertainty by modeling the parameters as probability distributions.

Mathematical Formulation

Consider the linear regression model where the target variable yy is predicted by a vector of features xRp\mathbf{x} \in \mathbb{R}^p (where pp is the number of features):

yi=βTxi+ϵiy_i = \beta^T \mathbf{x}_i + \epsilon_i

where:

  • yiy_i is the target value for the ii-th observation,
  • xi\mathbf{x}_i is the feature vector for the ii-th observation,
  • β\beta is the vector of unknown regression coefficients (parameters),
  • ϵi\epsilon_i is the error term (or residual), which is assumed to be normally distributed: ϵiN(0,σ2)\epsilon_i \sim \mathcal{N}(0, \sigma^2), i.e., errors are independent and identically distributed with mean 0 and variance σ2\sigma^2.

Thus, for each observation ii, the conditional probability of yiy_i given the feature vector xi\mathbf{x}_i and the parameters β\beta is:

P(yixi,β,σ2)=12πσ2exp((yiβTxi)22σ2)P(y_i | \mathbf{x}_i, \beta, \sigma^2) = \frac{1}{\sqrt{2 \pi \sigma^2}} \exp\left(-\frac{(y_i - \beta^T \mathbf{x}_i)^2}{2 \sigma^2}\right)

Prior Distribution

In Bayes Linear Regression, we assume a prior distribution for the parameters β\beta. A common choice is to assume a Gaussian prior on β\beta:

βN(0,τ2I)\beta \sim \mathcal{N}(\mathbf{0}, \tau^2 I)

where τ2\tau^2 is the prior variance, and II is the identity matrix. This prior distribution expresses the belief that the coefficients β\beta are likely to be close to zero, but with some uncertainty.

Likelihood Function

Given the assumption of normally distributed errors, the likelihood function for the observed data y=(y1,y2,,yn)T\mathbf{y} = (y_1, y_2, \dots, y_n)^T given the feature matrix X=(x1T,x2T,,xnT)T\mathbf{X} = (\mathbf{x}_1^T, \mathbf{x}_2^T, \dots, \mathbf{x}_n^T)^T and parameters β\beta is:

P(yX,β,σ2)=i=1n12πσ2exp((yiβTxi)22σ2)P(\mathbf{y} | \mathbf{X}, \beta, \sigma^2) = \prod_{i=1}^{n} \frac{1}{\sqrt{2 \pi \sigma^2}} \exp\left(-\frac{(y_i - \beta^T \mathbf{x}_i)^2}{2 \sigma^2}\right)

This represents the likelihood of observing the target values y\mathbf{y} given the feature vectors X\mathbf{X} and parameters β\beta, with noise variance σ2\sigma^2.

Posterior Distribution

By Bayes' Theorem, the posterior distribution of β\beta given the data (X,y)(\mathbf{X}, \mathbf{y}) is proportional to the product of the likelihood and the prior:

P(βX,y,σ2)P(yX,β,σ2)P(β)P(\beta | \mathbf{X}, \mathbf{y}, \sigma^2) \propto P(\mathbf{y} | \mathbf{X}, \beta, \sigma^2) P(\beta)

Substituting the expressions for the likelihood and prior:

P(βX,y,σ2)exp(12σ2i=1n(yiβTxi)2)exp(12τ2βTβ)P(\beta | \mathbf{X}, \mathbf{y}, \sigma^2) \propto \exp\left(-\frac{1}{2 \sigma^2} \sum_{i=1}^{n} (y_i - \beta^T \mathbf{x}_i)^2\right) \exp\left(-\frac{1}{2 \tau^2} \beta^T \beta \right)

Posterior Mean and Covariance

The posterior distribution of β\beta is Gaussian, and the mean and covariance can be computed as follows. Completing the square in the exponent:

P(βX,y,σ2)N(ββ^post,Σpost)P(\beta | \mathbf{X}, \mathbf{y}, \sigma^2) \sim \mathcal{N}(\beta | \hat{\beta}_{\text{post}}, \Sigma_{\text{post}})

where the posterior mean β^post\hat{\beta}_{\text{post}} is given by:

β^post=(σ2XTX+τ2I)1XTy\hat{\beta}_{\text{post}} = (\sigma^2 \mathbf{X}^T \mathbf{X} + \tau^2 I)^{-1} \mathbf{X}^T \mathbf{y}

and the posterior covariance Σpost\Sigma_{\text{post}} is:

Σpost=(σ2XTX+τ2I)1\Sigma_{\text{post}} = (\sigma^2 \mathbf{X}^T \mathbf{X} + \tau^2 I)^{-1}

Prediction

For a new observation xnew\mathbf{x}_{\text{new}}, the predictive distribution for the target variable ynewy_{\text{new}} is given by:

P(ynewxnew,X,y)=P(ynewxnew,β)P(βX,y,σ2)dβP(y_{\text{new}} | \mathbf{x}_{\text{new}}, \mathbf{X}, \mathbf{y}) = \int P(y_{\text{new}} | \mathbf{x}_{\text{new}}, \beta) P(\beta | \mathbf{X}, \mathbf{y}, \sigma^2) d\beta

This integral can be evaluated, leading to the following Gaussian predictive distribution:

P(ynewxnew,X,y)=N(ynewxnewTβ^post,σ2+xnewTΣpostxnew)P(y_{\text{new}} | \mathbf{x}_{\text{new}}, \mathbf{X}, \mathbf{y}) = \mathcal{N}(y_{\text{new}} | \mathbf{x}_{\text{new}}^T \hat{\beta}_{\text{post}}, \sigma^2 + \mathbf{x}_{\text{new}}^T \Sigma_{\text{post}} \mathbf{x}_{\text{new}})

This provides a probabilistic prediction, giving both the predicted value xnewTβ^post\mathbf{x}_{\text{new}}^T \hat{\beta}_{\text{post}} and the uncertainty in the prediction, represented by the variance σ2+xnewTΣpostxnew\sigma^2 + \mathbf{x}_{\text{new}}^T \Sigma_{\text{post}} \mathbf{x}_{\text{new}}.

Implementation

import numpy as np

class BayesLinearRegression:
def __init__(self, tau=1.0, sigma=1.0):
self.tau = tau
self.sigma = sigma
self.beta_post = None
self.sigma_post = None

def fit(self, X, y):
# Prior covariance (tau^2 I)
tau2_I = self.tau ** 2 * np.eye(X.shape[1])

# Likelihood covariance (sigma^2 I)
sigma2_I = self.sigma ** 2 * np.eye(X.shape[0])

# Compute the posterior covariance
XTX = X.T @ X
self.sigma_post = np.linalg.inv(np.linalg.inv(tau2_I) + (1 / self.sigma**2) * XTX)

# Compute the posterior mean
self.beta_post = (1 / self.sigma**2) * self.sigma_post @ X.T @ y

def predict(self, X_new):
# Predictive mean
y_pred_mean = X_new @ self.beta_post

# Predictive covariance
y_pred_cov = self.sigma ** 2 + np.sum(X_new @ self.sigma_post * X_new, axis=1)

return y_pred_mean, y_pred_cov

x = np.array([[1, 2], [2, 3], [3, 4], [4, 5], [5, 6], [7, 8]])
y = np.array([3, 5, 7, 9, 11, 15])
b_lr = BayesLinearRegression(tau=1.0, sigma=1.0)
b_lr.fit(x, y)
y_pred_mean, y_pred_cov = b_lr.predict(np.array([[0, 1]]))
print(y_pred_mean, y_pred_cov)