Назад к подготовке

ВопросСложнаяml-system-designML System Design на техническом собеседовании · T-Bank T-Bank

Таргеты, loss и negative sampling для ранкера социальной ленты

Таргеты, loss и negative sampling для ранкера социальной ленты

Ответить самому

Сначала сформулируйте ответ как на собеседовании, затем откройте разбор и оцените себя.

Загрузка

Короткий ответ

Use impression logs, define weighted engagement or multiple heads, sample negatives from shown-but-not-engaged posts, and choose pointwise, pairwise or listwise loss based on maturity and labels.

Полный разбор

The dataset should start from exposures: posts that were actually eligible and shown. Labels can be binary click/engagement, weighted engagement, dwell-time regression, or multiple targets for click, like, comment, share and hide. A single weighted target is simple; multi-task heads preserve differences between signals. Negatives should usually include shown-but-not-clicked or skipped posts, because random unseen posts make the task too easy and distort ranking. You can add hard negatives from the same topic or similar embeddings to improve discrimination. Pointwise losses are easiest to start with: binary cross-entropy for engagement probability or regression losses for weighted score. Pairwise/listwise losses better match ranking but require careful construction and are harder to debug. For social feeds, also monitor calibration by segment and exposure/popularity bias.

Теория

Feed rankers learn from biased logged decisions, so label definition and negative sampling matter as much as model class.

Типичные ошибки

Treat every unseen post as a negative.
Collapse all actions into a score without business weights.
Optimize MSE on an arbitrary target without ranking metrics.
Ignore exposure position and previous baseline bias.

Как отвечать на собеседовании

Start from impression logs, not only positive interactions.
Say why shown-but-not-clicked negatives are useful.