Пройти собеседование: Т-Банк: ML System Design

1Кейс10 мин

Цели и метрики рекомендательной ленты в банковском приложении

Ответьте без подсказки

Сначала проговорите ответ вслух или тезисами.

Запишите черновик

Формулы, план решения, риски и примеры.

Сравните с разбором

Откройте разбор только после своей попытки.

Открыть отдельную страницу вопроса

Показать разбор

Короткий ответ

Clarify the business objective first: retention, loyalty and time spent, not ad revenue. Then define engagement metrics plus guardrails for quality, safety, fatigue and downstream banking experience.

Подробный разбор

Start from product intent. If the feed is not monetized by ads, the recommender should support retention, app engagement, loyalty and cross-product trust rather than pure click maximization. This changes the metric set.

Primary metrics can include sessions with feed, dwell time, return rate, meaningful interactions, likes, comments, shares, follows and bookmarks. But each engagement metric is gameable. Use guardrails such as hide/report rate, low-quality content rate, repeated-topic fatigue, notification churn, creator concentration, latency and impact on core banking journeys.

Define the decision surface too: ranking posts in a feed for about millions of users, hundreds of thousands of historical posts and a steady inflow of new posts. That scope determines freshness, cold-start and serving constraints.

Типичные ошибки

Start with a model before defining why the feed exists.
Use only clicks for a social feed.
Forget safety and trust guardrails in a bank app.
Ignore baseline logs and current product behavior.

Как сказать на собеседовании

Say what is not the goal, especially ad revenue in this prompt.
Pair every engagement metric with at least one guardrail.

2Кейс12 мин

Представления пользователя и поста для мультимодальной ленты

Ответьте без подсказки

Сначала проговорите ответ вслух или тезисами.

Запишите черновик

Формулы, план решения, риски и примеры.

Сравните с разбором

Откройте разбор только после своей попытки.

Открыть отдельную страницу вопроса

Показать разбор

Короткий ответ

Start with interaction history, user profile features and post content features: category, text embeddings, image embeddings and simple statistics. Combine collaborative and content baselines before training a heavier ranker.

Подробный разбор

Represent posts with both structured and unstructured features: topic/category, author, age, language, text length, text embedding, image embedding, moderation/safety flags and early engagement statistics that are available at serving time.

Represent users with recent interaction history, long-term topic preferences, followed authors, profile or segment features that are allowed for this product, and aggregated embeddings of posts they engaged with. For a bank app, treat sensitive features carefully and avoid using them without a clear policy and fairness review.

A practical baseline is a hybrid candidate generator such as LightFM or ALS with content features, plus popularity/freshness fallback. Then add a ranker that combines candidate score, user-post affinity, freshness, author features and engagement signals. The baseline should be simple enough to debug and strong enough to collect better logs.

Типичные ошибки

Use only post embeddings and ignore user history.
Use bank profile features without discussing policy constraints.
Train on engagement stats that are not available at serving time.
Skip a simple baseline and jump directly to a deep two-tower model.

Как сказать на собеседовании

Separate user features, item features and interaction features.
Mention cold start for both users and posts.

3Кейс12 мин

Таргеты, loss и negative sampling для ранкера социальной ленты

Ответьте без подсказки

Сначала проговорите ответ вслух или тезисами.

Запишите черновик

Формулы, план решения, риски и примеры.

Сравните с разбором

Откройте разбор только после своей попытки.

Открыть отдельную страницу вопроса

Показать разбор

Короткий ответ

Use impression logs, define weighted engagement or multiple heads, sample negatives from shown-but-not-engaged posts, and choose pointwise, pairwise or listwise loss based on maturity and labels.

Подробный разбор

The dataset should start from exposures: posts that were actually eligible and shown. Labels can be binary click/engagement, weighted engagement, dwell-time regression, or multiple targets for click, like, comment, share and hide. A single weighted target is simple; multi-task heads preserve differences between signals.

Negatives should usually include shown-but-not-clicked or skipped posts, because random unseen posts make the task too easy and distort ranking. You can add hard negatives from the same topic or similar embeddings to improve discrimination.

Pointwise losses are easiest to start with: binary cross-entropy for engagement probability or regression losses for weighted score. Pairwise/listwise losses better match ranking but require careful construction and are harder to debug. For social feeds, also monitor calibration by segment and exposure/popularity bias.

Типичные ошибки

Treat every unseen post as a negative.
Collapse all actions into a score without business weights.
Optimize MSE on an arbitrary target without ranking metrics.
Ignore exposure position and previous baseline bias.

Как сказать на собеседовании

Start from impression logs, not only positive interactions.
Say why shown-but-not-clicked negatives are useful.

4Кейс12 мин

ML System Design

How would you use векторный поиск, user clustering and domain-specific text/image embeddings to improve a social-feed recommender?

Ответьте без подсказки

Сначала проговорите ответ вслух или тезисами.

Запишите черновик

Формулы, план решения, риски и примеры.

Сравните с разбором

Откройте разбор только после своей попытки.

Открыть отдельную страницу вопроса

Показать разбор

Короткий ответ

Use ANN over normalized post embeddings for scalable retrieval, cluster users only when personalization cost is too high, and introduce domain-specific encoders after a baseline proves where generic embeddings fail.

Подробный разбор

Vector search is useful when the item tower produces embeddings and the catalog is too large for brute-force scoring. Store text/image post embeddings in an ANN index such as HNSW, FAISS or a vector database, then measure recall@K, p95 latency, memory, update cost and downstream ranker quality. The similarity metric must match training: cosine for normalized embeddings, dot product if trained that way.

User clustering can reduce serving cost, but it trades personalization for efficiency. If you recommend at cluster level, define how the cluster vector or prototype user is built, how often clusters update, and how much quality drops for heterogeneous clusters. Often it is safer to use clusters for fallback pools or cache warmup, not as the only personalization mechanism.

Domain-specific text/image encoders should be introduced after observing errors: travel photos, restaurant posts and finance/investment posts may need different features. Start with pretrained embeddings plus lightweight adaptation, then consider separate encoders or expert routing by topic. Keep diversity and exploration metrics because stronger embeddings can still create narrow feedback loops.

Типичные ошибки

Choose a vector database without measuring ANN recall and latency.
Use cluster-level recommendations while assuming they are fully personalized.
Train domain-specific embedders before proving generic embeddings fail.
Ignore diversity and exploration after improving semantic similarity.

Как сказать на собеседовании

Explain when user clustering is acceptable and when it hurts quality.
Tie vector-search metrics to downstream ranking metrics.

5Кейс10 мин

ML System Design

You can find posts similar to a given post. How do you turn that into user-level candidate generation for a feed?

Ответьте без подсказки

Сначала проговорите ответ вслух или тезисами.

Запишите черновик

Формулы, план решения, риски и примеры.

Сравните с разбором

Откройте разбор только после своей попытки.

Открыть отдельную страницу вопроса

Показать разбор

Короткий ответ

Choose seed posts from the user history, retrieve similar posts per seed or per topic bucket, deduplicate and cap candidates, then send them to the ranker with user-context features.

Подробный разбор

Item-to-item retrieval needs user seeds. Pick recent or high-quality positive interactions from the user history: liked posts, long-dwell reads, follows or saved posts. To avoid one-topic collapse, bucket seeds by category or recency and cap how many seeds each bucket contributes.

For each seed, query a text/image embedding index or collaborative item-to-item model, then merge candidates. Deduplicate, remove already seen posts, apply freshness/safety/eligibility filters, and keep enough candidates for the ranker. The ranker then scores each candidate with the current user, post and context features.

This generator is strong for warm users and fresh content with good embeddings. It needs fallbacks for cold users, new topics and sparse histories: popularity, editorial/freshness pools, user-to-user by profile and exploration.

Типичные ошибки

Say “retrieve similar posts” without specifying similar to what.
Use all historical positives and overload one topic.
Forget deduplication and already-seen filtering.
Let item-to-item retrieval replace the ranker.

Как сказать на собеседовании

Describe seed selection from user history explicitly.
Mention category caps and deduplication.

6Кейс10 мин

ML System Design

How would you handle new users and new posts in a social-feed recommender with text and image content?

Ответьте без подсказки

Сначала проговорите ответ вслух или тезисами.

Запишите черновик

Формулы, план решения, риски и примеры.

Сравните с разбором

Откройте разбор только после своей попытки.

Открыть отдельную страницу вопроса

Показать разбор

Короткий ответ

For new posts, use content embeddings, metadata and exploration buckets. For new users, use onboarding/profile/context, popularity by segment and controlled exploration until enough interaction history exists.

Подробный разбор

New posts have no interaction history, so represent them through content: text embedding, image embedding, topic/category, author, language, freshness and safety signals. Route them through freshness and exploration pools so the system can collect initial feedback without flooding users.

New users need fallbacks before history exists. Use onboarding interests if available, coarse profile/context segments that are allowed for the product, geo/time/device context, global or segment popularity, and user-to-user similarity from non-sensitive profile features. Then quickly update a short-term profile from early positive and negative interactions.

Cold start should be measured. Track coverage for new posts, time to first impressions, new-user engagement, hide/report rate and diversity. Keep exploration bounded so it learns without damaging user trust.

Типичные ошибки

Serve only global top-pop to every new user.
Wait for interaction history before showing new posts.
Use sensitive profile features casually.
Ignore negative early feedback such as hides and skips.

Как сказать на собеседовании

Split the answer into new users and new items.
Mention bounded exploration and fast profile updates.

7Кейс10 мин

Вопрос про production ML

After launching a feed recommender, how do you decide when and how to retrain the models?

Ответьте без подсказки

Сначала проговорите ответ вслух или тезисами.

Запишите черновик

Формулы, план решения, риски и примеры.

Сравните с разбором

Откройте разбор только после своей попытки.

Открыть отдельную страницу вопроса

Показать разбор

Короткий ответ

Use scheduled retraining plus monitoring triggers: data freshness, distribution drift, offline quality, online KPIs and model health. Different components can have different refresh cadences.

Подробный разбор

A feed recommender should not be trained once and left alone. User interests, post inventory, creators and product behavior drift constantly. The basic setup is a recurring training DAG that rebuilds features, trains models, validates metrics, publishes artifacts and keeps rollback versions.

Retraining cadence depends on component freshness. Popularity and freshness features may update hourly. Candidate indexes and embeddings may update daily. Heavier rankers may retrain weekly or when enough new data accumulates. Cold-start handling may require frequent incremental updates.

Monitor both ML and product signals: data freshness, row counts, feature distribution drift, missing values, offline recall/NDCG, online CTR/dwell/retention, hide/report rate, candidate coverage, latency and segment regressions. Metric-triggered retraining can help, but scheduled retraining is easier to reason about; combine both with validation gates.

Типичные ошибки

Retrain everything on one cadence without considering component freshness.
Trigger retraining only after online metrics have already degraded.
Publish new artifacts without validation gates.
Monitor task success but not data or model quality.

Как сказать на собеседовании

Mention separate cadences for features, embeddings and ranker.
Use “scheduled plus trigger-based” rather than only one approach.

8Кейс12 мин

ML System Design

How would you train a two-tower or CLIP-like text-image recommender using user-post interactions?

Ответьте без подсказки

Сначала проговорите ответ вслух или тезисами.

Запишите черновик

Формулы, план решения, риски и примеры.

Сравните с разбором

Откройте разбор только после своей попытки.

Открыть отдельную страницу вопроса

Показать разбор

Короткий ответ

Encode user/context and post text-image content into a shared space, train positives from engaged impressions against random, exposed and hard negatives, and optimize contrastive, triplet or sampled-softmax losses.

Подробный разбор

A two-tower recommender has a user tower and an item tower. The item tower can combine text and image encoders in a CLIP-like representation; the user tower can aggregate history, profile/context features and recent interactions. At serving time, user embeddings retrieve item embeddings via dot product or cosine similarity.

Positive pairs should come from meaningful interactions such as clicks with dwell, likes, saves, comments or follows. Negatives can include random items, shown-but-not-engaged impressions and hard negatives from the same topic or close embedding neighborhood. Hard negatives are useful because they force the model to distinguish plausible alternatives.

Loss choices include contrastive InfoNCE/sampled softmax, triplet loss, pairwise ranking losses or multi-task objectives that also predict engagement. If the same item encoder feeds both retrieval and ranking, make sure the training objective matches both uses or split the towers when objectives conflict.

Типичные ошибки

Train only on random negatives and get weak discrimination.
Use CLIP pretraining but never adapt it to product interactions.
Forget that retrieval uses dot product/cosine while ranker may need richer cross-features.
Ignore exposure and position bias in logged interactions.

Как сказать на собеседовании

Name at least two negative sources.
Explain why the embedding must support ANN retrieval.

9Кейс8 мин

ML System Design

When would you use a pure collaborative ALS or matrix-factorization baseline for a social feed, and what are its limitations?

Ответьте без подсказки

Сначала проговорите ответ вслух или тезисами.

Запишите черновик

Формулы, план решения, риски и примеры.

Сравните с разбором

Откройте разбор только после своей попытки.

Открыть отдельную страницу вопроса

Показать разбор

Короткий ответ

Use ALS as a simple collaborative baseline when you have enough interactions. It is fast and interpretable, but weak for cold users/posts and content-only personalization.

Подробный разбор

ALS is a useful early baseline when the feed already has exposure and engagement logs. Convert views, clicks, likes, comments and saves into an implicit-feedback confidence matrix, factorize users and posts, and retrieve high dot-product posts for each user.

It is attractive because it is simple, scalable and debuggable. It can also provide one candidate generator alongside content and popularity generators.

Its limitations matter in social feeds: new posts and new users arrive constantly, semantic content is ignored unless you use a hybrid variant, and popularity/exposure bias can dominate. Regular retraining is needed because the interaction matrix changes as interests and inventory drift.

Типичные ошибки

Expect pure ALS to solve new-post cold start.
Treat all engagement actions as equal without weights.
Use ALS as the only candidate generator.
Forget retraining as the interaction matrix changes.

Как сказать на собеседовании

Position ALS as a baseline or one generator, not the whole system.
Call out cold start immediately.