Почему нулевая инициализация ломает нейросети
Почему нулевая инициализация ломает нейросети
Ответить самому
Сначала сформулируйте ответ как на собеседовании, затем откройте разбор и оцените себя.
Короткий ответ
Zero initialization makes hidden units in the same layer identical: they receive the same gradients and learn the same features. Logistic regression has one linear unit, so this symmetry issue does not occur there.
Полный разбор
In a multilayer neural network, if all weights in a hidden layer start equal, then neurons in that layer compute the same values. During backpropagation they also receive the same gradients and remain identical after the update. The layer behaves as if it had one neuron copied many times, so model capacity is wasted.
Random initialization breaks this symmetry. Xavier/Glorot, He and related initializations also scale variance so activations and gradients do not explode or vanish too quickly across layers.
Logistic regression is different because there is no group of interchangeable hidden neurons that need to specialize. Starting logistic regression weights at zero is usually fine for convex optimization; the gradient still points toward a useful solution.
Теория
The issue is symmetry between hidden units, not simply the numeric value zero.
Типичные ошибки
- Say zero initialization always prevents any gradient.
- Ignore the distinction between logistic regression and multilayer networks.
- Mention random initialization without explaining symmetry breaking.
Как отвечать на собеседовании
- Use the phrase "same activation, same gradient, same update" for hidden neurons.
- Then contrast with convex logistic regression.