Назад к подготовке

Вопрос про production ML

After launching a feed recommender, how do you decide when and how to retrain the models?

Ответить самому

Сначала сформулируйте ответ как на собеседовании, затем откройте разбор и оцените себя.

Загрузка

Короткий ответ

Use scheduled retraining plus monitoring triggers: data freshness, distribution drift, offline quality, online KPIs and model health. Different components can have different refresh cadences.

Полный разбор

A feed recommender should not be trained once and left alone. User interests, post inventory, creators and product behavior drift constantly. The basic setup is a recurring training DAG that rebuilds features, trains models, validates metrics, publishes artifacts and keeps rollback versions.

Retraining cadence depends on component freshness. Popularity and freshness features may update hourly. Candidate indexes and embeddings may update daily. Heavier rankers may retrain weekly or when enough new data accumulates. Cold-start handling may require frequent incremental updates.

Monitor both ML and product signals: data freshness, row counts, feature distribution drift, missing values, offline recall/NDCG, online CTR/dwell/retention, hide/report rate, candidate coverage, latency and segment regressions. Metric-triggered retraining can help, but scheduled retraining is easier to reason about; combine both with validation gates.

Теория

Retraining is a production control loop: fresh data, validation, artifact publishing, monitoring and rollback.

Типичные ошибки

  • Retrain everything on one cadence without considering component freshness.
  • Trigger retraining only after online metrics have already degraded.
  • Publish new artifacts without validation gates.
  • Monitor task success but not data or model quality.

Как отвечать на собеседовании

  • Mention separate cadences for features, embeddings and ranker.
  • Use “scheduled plus trigger-based” rather than only one approach.