ERM

Empirical Risk Minimization (ERM) Minimize the empirical risk(i.e., the average loss over a given dataset) when training a model.

$$R_{\text{emp}}(f) = \frac{1}{n} \sum_{i=1}^n \ell(f(x_i), y_i)$$

where:

  • $f(x_i)$ is the model’s prediction for the $i$-th input.
  • $\ell(f(x_i), y_i)$ is a loss function that quantifies the error between the predicted and true label for the $i$-th sample (e.g., squared loss for regression, cross-entropy for classification).

In Empirical Risk Minimization, the goal is to find a function $f$ from a given hypothesis class $\mathcal{H}$ that minimizes the empirical risk:

$$
\hat{f} = \arg \min_{f \in \mathcal{H}} R_{\text{emp}}(f)
$$

  • $f$ is a generic function (or hypothesis) that belongs to a hypothesis class $\mathcal{H}$, which is the set of all possible functions the learning algorithm can choose from. This function maps inputs $x$ (the features) to outputs
    $y$ (the predictions).
  • $\hat{f}$ (read as “f hat”) represents the specific function or model that minimizes the empirical risk over the training dataset. It is the learned function after applying the ERM principle. In other words, $\hat{f}$ is the best function (within the hypothesis class $\mathcal{H}$) that minimizes the average loss over the training samples.