Назад к подготовке

ML System Design

You can find posts similar to a given post. How do you turn that into user-level candidate generation for a feed?

Ответить самому

Сначала сформулируйте ответ как на собеседовании, затем откройте разбор и оцените себя.

Загрузка

Короткий ответ

Choose seed posts from the user history, retrieve similar posts per seed or per topic bucket, deduplicate and cap candidates, then send them to the ranker with user-context features.

Полный разбор

Item-to-item retrieval needs user seeds. Pick recent or high-quality positive interactions from the user history: liked posts, long-dwell reads, follows or saved posts. To avoid one-topic collapse, bucket seeds by category or recency and cap how many seeds each bucket contributes.

For each seed, query a text/image embedding index or collaborative item-to-item model, then merge candidates. Deduplicate, remove already seen posts, apply freshness/safety/eligibility filters, and keep enough candidates for the ranker. The ranker then scores each candidate with the current user, post and context features.

This generator is strong for warm users and fresh content with good embeddings. It needs fallbacks for cold users, new topics and sparse histories: popularity, editorial/freshness pools, user-to-user by profile and exploration.

Теория

Item-to-item retrieval becomes a user recommender only after seed selection, merging and ranking are defined.

Типичные ошибки

  • Say “retrieve similar posts” without specifying similar to what.
  • Use all historical positives and overload one topic.
  • Forget deduplication and already-seen filtering.
  • Let item-to-item retrieval replace the ranker.

Как отвечать на собеседовании

  • Describe seed selection from user history explicitly.
  • Mention category caps and deduplication.