Назад к подготовке

Вопрос про production ML

How would you handle geography in free-form real-estate queries and keep retrieval fast for millions of listings and high QPS?

Ответить самому

Сначала сформулируйте ответ как на собеседовании, затем откройте разбор и оцените себя.

Загрузка

Короткий ответ

Use geocoding/entity resolution for explicit places, user/session context for ambiguity, hierarchical geo cells for retrieval, and shard indexes by geo while precomputing listing embeddings offline.

Полный разбор

Geo is not just another text token. Queries can mention city, district, metro, street, landmark or ambiguous names that exist in many regions. A robust system combines NER/geocoding, dictionaries, maps data, user/session location and selected transaction context.

Represent geo hierarchically: country, region, city, district, neighborhood, metro/landmark and a spatial cell such as geohash/H3. Retrieval can first choose eligible geo scopes, then query only relevant shards or partitions. If the query has no explicit geo, use product context and user defaults carefully.

For load, precompute listing embeddings and listing attribute indexes offline. Online work should mostly parse the query, fetch candidate lists from geo-sharded lexical/vector indexes, and run a bounded ranker. Use ANN parameters, cache hot query parses, batch neural query encoding when useful, and monitor p95/p99 latency by city and query type.

Теория

Geo constraints are high-precision retrieval constraints and a natural sharding key.

Типичные ошибки

  • Let dense retrieval override hard geo constraints.
  • Assume street names are globally unique.
  • Compute item embeddings online for each query.

Как отвечать на собеседовании

  • Mention hierarchical geo resolution.
  • Say item embeddings are precomputed offline.