Назад к подготовке

Вопрос по метрикам

Minimizing squared error corresponds to maximum likelihood under what noise distribution, and why?

Ответить самому

Сначала сформулируйте ответ как на собеседовании, затем откройте разбор и оцените себя.

Загрузка

Короткий ответ

Least squares is equivalent to maximum likelihood when residuals are independent Gaussian noise with constant variance. The negative log-likelihood is proportional to the sum of squared residuals.

Полный разбор

Assume y_i = f(x_i) + epsilon_i, where epsilon_i are independent normal random variables with mean zero and variance sigma squared. The likelihood is the product of Gaussian densities for the residuals y_i - f(x_i).

Taking the negative log-likelihood gives a constant plus (1 / 2 sigma squared) times the sum of squared residuals. Since sigma is constant with respect to the model parameters, maximizing likelihood is the same as minimizing mean squared error or sum of squared errors.

This also explains why squared loss is sensitive to outliers: Gaussian noise assigns rapidly decreasing probability to large residuals. If the noise is Laplace, the analogous MLE loss is absolute error; if the noise is heavy-tailed, robust losses may be more appropriate.