Назад к подготовке

Вопрос про production ML

Sketch the online architecture for query parsing, candidate generation, ranking and blending. How do services communicate and fail safely?

Ответить самому

Сначала сформулируйте ответ как на собеседовании, затем откройте разбор и оцените себя.

Загрузка

Короткий ответ

Keep the online path synchronous and bounded: query parser, retrieval services, feature fetch, ranker, blending and fallback. Use async queues for offline index/model updates, not for the user-facing critical path.

Полный разбор

A simple online path is: API/search gateway receives query; query parser extracts text embedding, attributes and geo; retrieval services query lexical/vector/structured indexes; feature service fetches bounded item/user/context features; ranker scores candidates; blender applies paid/organic and business constraints; response is returned with logging.

Inter-service calls on the critical path are usually synchronous RPC/HTTP/gRPC with tight timeouts, retries where safe and circuit breakers. If the query parser or ranker fails, degrade to lexical/structured search, cached hot results, previous stable model or business-rule ranking. Kafka or another broker is better for offline index updates, feature refresh, training logs and model publish events, not for waiting on every user query.

Reliability needs health checks that execute real inference, p95/p99 latency monitoring, coverage metrics, cache hit rate, fallback rate, empty-result rate and rollback for model/index versions. Avoid passing huge embeddings or candidate payloads between too many services if co-locating retrieval and ranking is simpler and faster.

Теория

The online search path is latency-sensitive; asynchronous queues are useful around it, not necessarily inside it.

Типичные ошибки

  • Put Kafka between every online service without a latency reason.
  • Forget fallbacks when the query model or ranker is down.
  • Pass large candidate payloads across many network hops.

Как отвечать на собеседовании

  • Separate online synchronous path from offline async updates.
  • Name timeout, fallback, circuit breaker and rollback.