Bayes Classification

Bayes Algorithm is a simple algorithm based on the Bayes Theorem. It is a simple algorithm that can be used for bith classification and regression problems.

Idea

Suppose we have train dataset,

\mathcal{D} = \{(x_1, y_1), (x_2, y_2), \ldots, (x_n, y_n)\}

For Bayes-based method, we find best fit probability distribution, instead of a plain function. For classification problem, we use the following prediction,

\hat{y} = argmax_{y \in \mathbb{Y}} P(y|x)

Where $\mathbb{Y}$ is the set of all possible labels.

$x$ can either be continuous or discrete. But they needs different treatment. We disparate them into two sections, $x_c$ as continuous, that ism $x_c \in \mathbb{R}^n$ , and $x_d$ as discrete, that is, $x_d \in \mathbb{X}^n$ , where $\mathbb{X}$ is the set of all possible values of $x$ .

The key for Bayes Algorithm is to find the probability distribution of $P(y|x)$ . By bayes theorem, we have,

P(y|x) = \frac{P(x|y)P(y)}{P(x)}

For classification problem, we usually use even distribution for prior distribution, that is, $P(y) = \frac{1}{|\mathbb{Y}|}$ .

Distribution of Discrete Input

We typically also use even distribution for prior distribution of $x$ , that is, $P(x) = \frac{1}{|\mathbb{X}|}$ . Of course, if you have information about the distribution of $x$ , you can use that information to get a better result.

To fit a $P(x|y)$ from the dataset, the result is obvious, we can use the following formula,

P(x|y) = \frac{count(x, y)}{count(y)}

Where $count(x, y)$ is the number of times $x$ appears in the dataset with label $y$ , and $count(y)$ is the number of times label $y$ appears in the dataset.

Distribution of Continuous Input

For continuous input. $P(x)$ is meaningless because it is always zero. So we just ignore the denominator- because we only care about the relative value.

So we only focus on $P(x|y)$ . Gaussian distribution is widely presumed, that is,

P(x|y) = \frac{1}{\sqrt{2\pi}\sigma_y}e^{-\frac{(x-\mu_y)^2}{2\sigma_y^2}}

We need to find $\mu_y$ and $\sigma_y$ from the dataset. The result is obvious, we can use the following formula,

\mu_y = \mathbb{E}[x|y] = \frac{\sum_{i=1}^n x_i}{n}

\sigma_y = \sqrt{\mathbb{E}[\frac{n}{n - 1}(x-\mu_y)^2|y]} = \sqrt{\frac{\sum_{i=1}^n (x_i-\mu_y)^2}{n -1}}

Where $x_i$ is the $i$ -th data point in the dataset.

Implementation

The implementation is very simple, we just need to calculate the probability distribution of $P(x|y)$ and $P(y|x)$ , and then we can use the formula to get the prediction.

import numpy as np

class NaiveBayesClassifier:
    def __init__(self, feature_types):
        self.feature_types = feature_types  # List indicating type ('continuous'/'discrete') for each feature
        self.classes = None
        self.prior = {}
        self.params = []  # For storing feature parameters (mean/std for continuous, probabilities for discrete)
    
    def fit(self, X, y):
        self.classes = np.unique(y)
        n_features = X.shape[1]
        self.params = [{} for _ in range(n_features)]
        
        # Calculate prior probabilities
        for cls in self.classes:
            self.prior[cls] = np.mean(y == cls)
        
        # Calculate likelihood parameters for each feature and class
        for i in range(n_features):
            feature_type = self.feature_types[i]
            for cls in self.classes:
                X_cls = X[y == cls, i]
                if feature_type == 'discrete':
                    # Discrete feature: calculate probability of each value
                    values, counts = np.unique(X_cls, return_counts=True)
                    self.params[i][cls] = {v: c/len(X_cls) for v, c in zip(values, counts)}
                else:
                    # Continuous feature: calculate mean and standard deviation
                    mean = np.mean(X_cls)
                    std = np.std(X_cls, ddof=1)  # Unbiased estimator
                    self.params[i][cls] = (mean, std)
    
    def predict(self, X):
        predictions = []
        for sample in X:
            max_log_prob = -np.inf
            best_class = None
            for cls in self.classes:
                log_prob = np.log(self.prior[cls])
                valid = True
                for i in range(len(self.feature_types)):
                    feature_val = sample[i]
                    param = self.params[i][cls]
                    if self.feature_types[i] == 'discrete':
                        # Handle unseen discrete values with 0 probability
                        prob = param.get(feature_val, 0.0)
                        if prob == 0:
                            valid = False
                            break
                        log_prob += np.log(prob)
                    else:
                        # Gaussian probability density
                        mean, std = param
                        if std == 0:
                            if feature_val != mean:
                                valid = False
                                break
                        else:
                            log_prob -= 0.5 * (np.log(2 * np.pi) + 2 * np.log(std)) + ((feature_val - mean) ** 2) / (2 * std ** 2)
                if valid and log_prob > max_log_prob:
                    max_log_prob = log_prob
                    best_class = cls
            predictions.append(best_class if best_class is not None else self.classes[0])
        return np.array(predictions)

from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

iris = datasets.load_iris()
X = iris.data 
y = iris.target

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

feature_types = ['continuous'] * X.shape[1]

nb_classifier = NaiveBayesClassifier(feature_types=feature_types)
nb_classifier.fit(X_train, y_train)

y_pred = nb_classifier.predict(X_test)

accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy:.4f}")

Idea​

Distribution of Discrete Input​

Distribution of Continuous Input​

Implementation​

Idea

Distribution of Discrete Input

Distribution of Continuous Input

Implementation