Skip to main content

Bayes Classification

Bayes Algorithm is a simple algorithm based on the Bayes Theorem. It is a simple algorithm that can be used for bith classification and regression problems.

Idea

Suppose we have train dataset,

D={(x1,y1),(x2,y2),,(xn,yn)}\mathcal{D} = \{(x_1, y_1), (x_2, y_2), \ldots, (x_n, y_n)\}

For Bayes-based method, we find best fit probability distribution, instead of a plain function. For classification problem, we use the following prediction,

y^=argmaxyYP(yx)\hat{y} = argmax_{y \in \mathbb{Y}} P(y|x)

Where Y\mathbb{Y} is the set of all possible labels.

xx can either be continuous or discrete. But they needs different treatment. We disparate them into two sections, xcx_c as continuous, that ism xcRnx_c \in \mathbb{R}^n, and xdx_d as discrete, that is, xdXnx_d \in \mathbb{X}^n , where X\mathbb{X} is the set of all possible values of xx.

The key for Bayes Algorithm is to find the probability distribution of P(yx)P(y|x). By bayes theorem, we have,

P(yx)=P(xy)P(y)P(x)P(y|x) = \frac{P(x|y)P(y)}{P(x)}

For classification problem, we usually use even distribution for prior distribution, that is, P(y)=1YP(y) = \frac{1}{|\mathbb{Y}|}.

Distribution of Discrete Input

We typically also use even distribution for prior distribution of xx, that is, P(x)=1XP(x) = \frac{1}{|\mathbb{X}|}. Of course, if you have information about the distribution of xx, you can use that information to get a better result.

To fit a P(xy)P(x|y) from the dataset, the result is obvious, we can use the following formula,

P(xy)=count(x,y)count(y)P(x|y) = \frac{count(x, y)}{count(y)}

Where count(x,y)count(x, y) is the number of times xx appears in the dataset with label yy, and count(y)count(y) is the number of times label yy appears in the dataset.

Distribution of Continuous Input

For continuous input. P(x)P(x) is meaningless because it is always zero. So we just ignore the denominator- because we only care about the relative value.

So we only focus on P(xy)P(x|y). Gaussian distribution is widely presumed, that is,

P(xy)=12πσye(xμy)22σy2P(x|y) = \frac{1}{\sqrt{2\pi}\sigma_y}e^{-\frac{(x-\mu_y)^2}{2\sigma_y^2}}

We need to find μy\mu_y and σy\sigma_y from the dataset. The result is obvious, we can use the following formula,

μy=E[xy]=i=1nxin\mu_y = \mathbb{E}[x|y] = \frac{\sum_{i=1}^n x_i}{n} σy=E[nn1(xμy)2y]=i=1n(xiμy)2n1\sigma_y = \sqrt{\mathbb{E}[\frac{n}{n - 1}(x-\mu_y)^2|y]} = \sqrt{\frac{\sum_{i=1}^n (x_i-\mu_y)^2}{n -1}}

Where xix_i is the ii-th data point in the dataset.

Implementation

The implementation is very simple, we just need to calculate the probability distribution of P(xy)P(x|y) and P(yx)P(y|x), and then we can use the formula to get the prediction.

import numpy as np

class NaiveBayesClassifier:
def __init__(self, feature_types):
self.feature_types = feature_types # List indicating type ('continuous'/'discrete') for each feature
self.classes = None
self.prior = {}
self.params = [] # For storing feature parameters (mean/std for continuous, probabilities for discrete)

def fit(self, X, y):
self.classes = np.unique(y)
n_features = X.shape[1]
self.params = [{} for _ in range(n_features)]

# Calculate prior probabilities
for cls in self.classes:
self.prior[cls] = np.mean(y == cls)

# Calculate likelihood parameters for each feature and class
for i in range(n_features):
feature_type = self.feature_types[i]
for cls in self.classes:
X_cls = X[y == cls, i]
if feature_type == 'discrete':
# Discrete feature: calculate probability of each value
values, counts = np.unique(X_cls, return_counts=True)
self.params[i][cls] = {v: c/len(X_cls) for v, c in zip(values, counts)}
else:
# Continuous feature: calculate mean and standard deviation
mean = np.mean(X_cls)
std = np.std(X_cls, ddof=1) # Unbiased estimator
self.params[i][cls] = (mean, std)

def predict(self, X):
predictions = []
for sample in X:
max_log_prob = -np.inf
best_class = None
for cls in self.classes:
log_prob = np.log(self.prior[cls])
valid = True
for i in range(len(self.feature_types)):
feature_val = sample[i]
param = self.params[i][cls]
if self.feature_types[i] == 'discrete':
# Handle unseen discrete values with 0 probability
prob = param.get(feature_val, 0.0)
if prob == 0:
valid = False
break
log_prob += np.log(prob)
else:
# Gaussian probability density
mean, std = param
if std == 0:
if feature_val != mean:
valid = False
break
else:
log_prob -= 0.5 * (np.log(2 * np.pi) + 2 * np.log(std)) + ((feature_val - mean) ** 2) / (2 * std ** 2)
if valid and log_prob > max_log_prob:
max_log_prob = log_prob
best_class = cls
predictions.append(best_class if best_class is not None else self.classes[0])
return np.array(predictions)

from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

iris = datasets.load_iris()
X = iris.data
y = iris.target

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

feature_types = ['continuous'] * X.shape[1]

nb_classifier = NaiveBayesClassifier(feature_types=feature_types)
nb_classifier.fit(X_train, y_train)

y_pred = nb_classifier.predict(X_test)

accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy:.4f}")