К обычному разбору
Тренировка по собеседованиюТехническое собеседованиеConstructor2025-09-01

Constructor: Техническое собеседование

Идите сверху вниз: сначала попробуйте сами, затем откройте разбор. Если шаг с кодом, пишите решение прямо здесь и запускайте проверки на странице.

Шагов
5
Вопросов
5
Задач
0
1Вопрос10 мин

Офлайн-оценка дополняющих fashion-рекомендаций

Офлайн-оценка дополняющих fashion-рекомендаций

Ответьте без подсказки

Сначала проговорите ответ вслух или тезисами.

Запишите черновик

Формулы, план решения, риски и примеры.

Сравните с разбором

Откройте разбор только после своей попытки.

Показать разбор

Короткий ответ

Evaluate retrieval with recall@K against outfit/compatibility labels, evaluate reranking with list quality and business proxies, and add human/style review for visual compatibility and diversity.

Подробный разбор

Separate retrieval and reranking. For candidate generation, use outfit datasets, stylist labels, VLM-prelabeled data reviewed by humans, or historical co-engagement where appropriate. Measure recall@K, category coverage and how often a compatible item appears in the candidate set.

For reranking, pointwise metrics are not enough because the final list should look like a coherent outfit. Add list-level metrics such as category diversity, intra-list similarity, price/availability constraints, brand/category balance and business proxies such as expected conversion or revenue.

Finally, inspect examples with domain reviewers. Fashion compatibility has subjective and visual aspects, so offline numerical metrics should be paired with structured human review before online A/B testing.

Типичные ошибки

  • Use only co-clicks and call them ground truth for style compatibility.
  • Evaluate candidate generator and ranker with the same metric.
  • Ignore category diversity in outfit recommendations.
  • Skip human review for a visual-style product.

Как сказать на собеседовании

  • State which metric belongs to retrieval and which belongs to reranking.
  • Mention stylist or human review as a calibration layer.
2Вопрос14 мин

Восстановление пунктуации и капитализации в ASR-тексте

Восстановление пунктуации и капитализации в ASR-тексте

Ответьте без подсказки

Сначала проговорите ответ вслух или тезисами.

Запишите черновик

Формулы, план решения, риски и примеры.

Сравните с разбором

Откройте разбор только после своей попытки.

Показать разбор

Короткий ответ

Frame it as token-level sequence labeling: for each word predict capitalization and punctuation-after-word classes. This preserves ASR words and avoids generative hallucination.

Подробный разбор

A simple baseline is prompting an LLM to rewrite the text, but that can change words, be expensive and make output harder to constrain. A better production framing is sequence labeling over the original ASR tokens.

For every word, predict two labels: capitalization class for the word and punctuation class after the word. Capitalization can be binary or richer if title case/acronyms matter. Punctuation can be none, comma, period, question mark, colon and other supported symbols. A shared Transformer encoder can produce contextual token representations and two classifier heads.

Training data can be created by taking clean punctuated text, normalizing it to lowercase words without punctuation as input, and using the original punctuation/case as labels. LLMs or human review can help bootstrap domain-specific data, but the model should be evaluated on real ASR-like text because ASR errors and spoken language differ from clean written corpora.

Типичные ошибки

  • Let a generative model rewrite words when only punctuation/case should change.
  • Ignore ASR-specific errors and train only on clean written text.
  • Treat punctuation and capitalization as independent without shared context.
  • Forget acronyms, names and domain-specific terms.

Как сказать на собеседовании

  • Say explicitly that the output must preserve original words.
  • Use two heads: punctuation and capitalization.
3Вопрос12 мин

Токенизация и BERT-style разметка против autoregressive rewriting

Токенизация и BERT-style разметка против autoregressive rewriting

Ответьте без подсказки

Сначала проговорите ответ вслух или тезисами.

Запишите черновик

Формулы, план решения, риски и примеры.

Сравните с разбором

Откройте разбор только после своей попытки.

Показать разбор

Короткий ответ

BERT-style labeling sees both left and right context in one pass and predicts constrained labels, while autoregressive rewriting may change words. Tokenization should align labels to words or word starts.

Подробный разбор

Autoregressive generation is natural for text rewriting, but this task is constrained: preserve words and only add punctuation/case. Generation can hallucinate, correct ASR words that should remain unchanged, or make the decoding process expensive.

A BERT-style encoder runs over the whole sequence once and predicts labels for each word or word boundary. This uses bidirectional context, which is important for punctuation, names and sentence boundaries. It also keeps the output constrained to a small class set.

Tokenization is the main detail. If using subword BPE, align labels to the first subtoken of each word and mask the rest, or use a tokenizer/preprocessing scheme that keeps word boundaries explicit. Avoid relying on tokens that merge spaces, punctuation and word pieces in ways that make capitalization labels ambiguous.

Типичные ошибки

  • Put punctuation labels on arbitrary subword pieces.
  • Ignore how spaces are encoded in common BPE tokenizers.
  • Use autoregressive output without checking word preservation.
  • Assume capitalization is always sentence-initial and forget named entities.

Как сказать на собеседовании

  • Mention label masking for non-first subtokens.
  • Frame hallucination as a product bug, not just a model detail.
4Вопрос10 мин

Метрики качества для восстановления пунктуации и капитализации

Метрики качества для восстановления пунктуации и капитализации

Ответьте без подсказки

Сначала проговорите ответ вслух или тезисами.

Запишите черновик

Формулы, план решения, риски и примеры.

Сравните с разбором

Откройте разбор только после своей попытки.

Показать разбор

Короткий ответ

Use token-level accuracy and per-class precision/recall/F1 for punctuation and capitalization, plus sentence-level readability checks and domain-slice evaluation.

Подробный разбор

At the token level, punctuation restoration is a multiclass classification problem and capitalization is often binary or small multiclass classification. Report accuracy, macro/micro F1, and per-class precision/recall because comma, period and question mark errors have different frequencies and costs.

Token accuracy alone can be misleading if most positions have no punctuation. For rare punctuation classes, per-class recall matters. For user-facing quality, also evaluate sentence-level readability, over-punctuation rate, missing sentence breaks and whether named entities/acronyms are capitalized correctly.

Slice the evaluation by language, domain, speaker style, ASR confidence, text length and noisy terms. If users complain online while offline metrics look good, the issue may be domain shift or a metric that does not capture readability.

Типичные ошибки

  • Report only overall accuracy when most labels are “no punctuation”.
  • Treat comma and period errors as equally important without checking product impact.
  • Ignore ASR confidence and domain slices.
  • Skip human readability evaluation.

Как сказать на собеседовании

  • Call out class imbalance immediately.
  • Add a human/readability metric beyond token labels.
5Вопрос10 мин

Отладка разрыва между офлайн-оценкой и качеством в продукте

Отладка разрыва между офлайн-оценкой и качеством в продукте

Ответьте без подсказки

Сначала проговорите ответ вслух или тезисами.

Запишите черновик

Формулы, план решения, риски и примеры.

Сравните с разбором

Откройте разбор только после своей попытки.

Показать разбор

Короткий ответ

First verify serving parity, then inspect complaint examples, compare them to validation slices, and look for domain shift, ASR errors, unseen terminology or weak metrics.

Подробный разбор

Start with engineering parity: the model version, tokenizer, preprocessing, normalization, thresholds and postprocessing must match validation. Reproduce user examples offline to see whether the served model and offline model give the same output.

Then analyze the data. Complaints may come from domains missing in validation, new terms, named entities, acronyms, speaker disfluencies, ASR substitutions, long contexts or non-standard punctuation style. Build slices from the complained examples and compare their metrics to the aggregate validation set.

Finally question the metric. A high token-level score can still produce unreadable sentences if rare sentence-boundary errors are bad. Add production monitoring, sampled human review, complaint tagging and active-learning loops that feed hard cases into the next validation/training set.

Типичные ошибки

  • Assume complaints are noise because validation accuracy is high.
  • Skip serving-tokenizer parity checks.
  • Retrain blindly without labeling failed examples.
  • Ignore that user complaints may target rare but severe errors.

Как сказать на собеседовании

  • Say “reproduce the exact online example offline” early.
  • Separate serving bugs from data/metric problems.