Вопрос по метрикам
Two players repeatedly toss a fair coin. One waits for HH, the other waits for HT. Who finishes faster on average and how would you reason about it?
Сначала проговорите ответ вслух или тезисами.
Формулы, план решения, риски и примеры.
Откройте разбор только после своей попытки.
Показать разбор
Короткий ответ
HT has smaller expected waiting time than HH. HH has expected waiting time 6 tosses because an H followed by T partially resets progress; HT has expected waiting time 4 tosses.
Подробный разбор
The patterns are not equally fast even though both have probability 1/4 on any fixed pair. Overlapping matters. HH overlaps with itself: after seeing H you are one step toward HH, but if the next toss is T you lose that progress. HT also starts with H, but after H then T it completes immediately.
Use states. For HH, let E0 be expected tosses from no useful suffix and E1 after seeing H. E0 = 1 + 0.5 E1 + 0.5 E0, because T keeps you at E0 and H moves to E1. E1 = 1 + 0.5*0 + 0.5 E0, because H finishes and T resets. Solving gives E0 = 6.
For HT, E0 = 1 + 0.5 E1 + 0.5 E0 and E1 = 1 + 0.5 E1 + 0.5*0, because H keeps suffix H and T finishes. Solving gives E1 = 2 and E0 = 4. Therefore the HT player wins faster on average.
Типичные ошибки
- Assume both patterns have the same expected time because both have probability 1/4.
- Forget that after HH fails with HT, the final H/T state differs depending on the pattern.
- Try to enumerate only the first two tosses.
Вопрос по метрикам
Minimizing squared error corresponds to maximum likelihood under what noise distribution, and why?
Сначала проговорите ответ вслух или тезисами.
Формулы, план решения, риски и примеры.
Откройте разбор только после своей попытки.
Показать разбор
Короткий ответ
Least squares is equivalent to maximum likelihood when residuals are independent Gaussian noise with constant variance. The negative log-likelihood is proportional to the sum of squared residuals.
Подробный разбор
Assume y_i = f(x_i) + epsilon_i, where epsilon_i are independent normal random variables with mean zero and variance sigma squared. The likelihood is the product of Gaussian densities for the residuals y_i - f(x_i).
Taking the negative log-likelihood gives a constant plus (1 / 2 sigma squared) times the sum of squared residuals. Since sigma is constant with respect to the model parameters, maximizing likelihood is the same as minimizing mean squared error or sum of squared errors.
This also explains why squared loss is sensitive to outliers: Gaussian noise assigns rapidly decreasing probability to large residuals. If the noise is Laplace, the analogous MLE loss is absolute error; if the noise is heavy-tailed, robust losses may be more appropriate.