Назад к подготовке

ML System Design

Which offline and online metrics would you use for a similar-items recommender, and what pitfalls are easy to miss?

Ответить самому

Сначала сформулируйте ответ как на собеседовании, затем откройте разбор и оцените себя.

Загрузка

Короткий ответ

Use Recall@K, Precision@K, NDCG, coverage/diversity and latency offline, but evaluate against meaningful candidate sets with hard negatives. Online, use user-level A/B metrics such as CTR, watch time, conversion, retention and guardrails.

Полный разбор

Offline retrieval metrics include Recall@K and HitRate@K against labeled positives. Ranking metrics include Precision@K, NDCG@K and MRR when positions matter. Add coverage, novelty/diversity, catalog freshness, cold-start slice metrics, latency and index freshness.

The big pitfall is evaluation against easy random negatives. If the candidate set contains one positive and thousands of random unrelated items, almost any reasonable model can look excellent. Use hard-negative candidate pools, judged pairs and realistic retrieval/reranking sets.

Online metrics should reflect product value: CTR on similar items, watch starts, watch time, purchases/subscriptions where relevant, add-to-list, downstream retention and revenue. Split by users for A/B tests in most consumer scenarios, define guardrails for bad recommendations, latency and content policy, and precompute MDE/power before launching.

Теория

Offline metrics should approximate the real decision surface; online metrics decide product value.

Типичные ошибки

  • Report only Recall@K on random negatives.
  • Ignore cold-start and catalog-coverage slices.
  • A/B split by item when user-level interference is the main concern.

Как отвечать на собеседовании

  • Mention NDCG and then immediately discuss the candidate-set pitfall.
  • Separate offline ranking quality from online business impact.