Пройти собеседование: CIAN: Техническое собеседование

1Вопрос10 мин

База Transformer: токены, positional encoding и cross-attention

Ответьте без подсказки

Сначала проговорите ответ вслух или тезисами.

Запишите черновик

Формулы, план решения, риски и примеры.

Сравните с разбором

Откройте разбор только после своей попытки.

Открыть отдельную страницу вопроса

Показать разбор

Короткий ответ

A Transformer maps tokens to embeddings, adds positional information, then applies attention and feed-forward blocks. In encoder-decoder cross-attention, decoder states produce queries, while encoder outputs produce keys and values.

Подробный разбор

A solid answer starts with the data flow. Text is split into tokens, token ids are mapped to embeddings, and positional information is added because self-attention itself is permutation-invariant. Positional information may be sinusoidal, learned, rotary or another relative-position scheme.

Each Transformer block combines multi-head attention, residual connections, normalization and a position-wise feed-forward network. Encoder blocks use self-attention over the full input. Decoder blocks add causal self-attention, then cross-attention over encoder outputs.

For decoder cross-attention, the decoder hidden states are projected to queries. The encoder outputs are projected to keys and values. This is what lets the decoder ask which source positions matter for generating the current target token.

Типичные ошибки

Say that attention alone knows token order without positional information.
Mix up self-attention and encoder-decoder cross-attention.
Describe Q/K/V as fixed inputs rather than learned projections of hidden states.

Как сказать на собеседовании

Draw the encoder and decoder separately.
For cross-attention, say explicitly: Q from decoder, K/V from encoder.

2Вопрос10 мин

Dropout, BatchNorm и fine-tuning на маленьких батчах

Ответьте без подсказки

Сначала проговорите ответ вслух или тезисами.

Запишите черновик

Формулы, план решения, риски и примеры.

Сравните с разбором

Откройте разбор только после своей попытки.

Открыть отдельную страницу вопроса

Показать разбор

Короткий ответ

Dropout is stochastic during training and disabled or rescaled at inference. BatchNorm uses batch statistics during training and running statistics at inference; small batches make those statistics noisy, so freezing BN or using LayerNorm/GroupNorm can be safer.

Подробный разбор

Dropout randomly masks activations during training to reduce co-adaptation. At inference the model should use the full network with the convention used by the framework: either activations are scaled during training ("inverted dropout") or they are scaled at inference.

BatchNorm normalizes activations using batch mean and variance during training and running estimates during inference. Small batches make the estimates noisy, and distributed training can make per-device statistics inconsistent. This is why fine-tuning with tiny batches often freezes BatchNorm, switches it to eval mode, uses pre-trained running stats, or replaces it with LayerNorm, GroupNorm or InstanceNorm where appropriate.

With many GPUs and effective large batches, SyncBatchNorm can aggregate statistics across devices, but it adds communication cost. Gradient accumulation increases effective batch size for gradients, but does not automatically fix per-forward BatchNorm statistics unless the implementation accounts for it.

Типичные ошибки

Assume gradient accumulation fixes BatchNorm statistics.
Forget that train and eval modes change BatchNorm and dropout behavior.
Use BatchNorm with batch size 1 and expect stable statistics.

Как сказать на собеседовании

Mention freezing BatchNorm during fine-tuning.
Separate optimization batch size from normalization statistics.

3Вопрос10 мин

Сжатие моделей и catastrophic forgetting

Ответьте без подсказки

Сначала проговорите ответ вслух или тезисами.

Запишите черновик

Формулы, план решения, риски и примеры.

Сравните с разбором

Откройте разбор только после своей попытки.

Открыть отдельную страницу вопроса

Показать разбор

Короткий ответ

Compression options include quantization, pruning and distillation; adaptation can use LoRA/adapters. Forgetting is detected on held-out general benchmarks and reduced with replay data, regularization to the base model and parameter-efficient fine-tuning.

Подробный разбор

Compression and adaptation are different levers. Quantization reduces numeric precision, pruning removes weights or structures, distillation trains a smaller student from a stronger teacher, and LoRA/adapters update a small number of parameters while keeping the base mostly fixed.

Catastrophic forgetting shows up when fine-tuning improves the new domain but degrades general capabilities or old-domain benchmarks. You need an evaluation suite with both new-domain and old-domain tasks, plus slices for safety-critical or business-critical behavior.

Common mitigations are replaying a controlled mixture of old-domain data, using teacher logits or reference answers, regularizing the new model toward the base model, lowering learning rates, early stopping, and using LoRA or adapters instead of full fine-tuning. The trade-off should be measured: do not preserve old capabilities so aggressively that the model fails the target domain.

Типичные ошибки

Only evaluate on the new domain.
Treat LoRA as a guarantee against all forgetting.
Distill without checking that the teacher is reliable on the target data.

Как сказать на собеседовании

Say “new-domain metrics plus old-domain regression suite”.
Name replay data and parameter-efficient tuning as mitigations.

4Вопрос7 мин

Когда одно дерево решений может обойти Random Forest

Ответьте без подсказки

Сначала проговорите ответ вслух или тезисами.

Запишите черновик

Формулы, план решения, риски и примеры.

Сравните с разбором

Откройте разбор только после своей попытки.

Открыть отдельную страницу вопроса

Показать разбор

Короткий ответ

If the target depends almost entirely on one strong feature and Random Forest uses feature subsampling, many trees may miss that feature and vote noisily, while one unrestricted tree can split perfectly.

Подробный разбор

A concrete example is a small synthetic dataset where one feature perfectly separates the classes and all other features are noise. An unrestricted decision tree can choose the separating feature at the root and achieve near-perfect accuracy.

A Random Forest can underperform if each tree sees only a subset of features and many trees do not include the decisive feature near the root. Those trees split on noise and add bad votes. The ensemble usually reduces variance, but if its randomization systematically hides the only useful signal, it can increase bias or dilute a clean rule.

Other practical cases include very small datasets, high interpretability constraints, temporal leakage where a forest overfits many unstable proxies, or monotonic/business-rule settings where one simple rule is closer to the deployment behavior.

Типичные ошибки

Say “Random Forest is always better”.
Give a vague small-data answer without explaining the mechanism.
Ignore feature subsampling and noisy votes.

Как сказать на собеседовании

Use the one-perfect-feature example.
Tie the answer to bias, variance and feature subsampling.

5Вопрос6 мин

ROC-AUC: построение и интерпретация

Ответьте без подсказки

Сначала проговорите ответ вслух или тезисами.

Запишите черновик

Формулы, план решения, риски и примеры.

Сравните с разбором

Откройте разбор только после своей попытки.

Открыть отдельную страницу вопроса

Показать разбор

Короткий ответ

Sweep the classification threshold, plot TPR against FPR, and take the area under that curve. ROC-AUC is the probability that a random positive gets a higher score than a random negative.

Подробный разбор

For every threshold on the model score, compute true positive rate and false positive rate. Plot FPR on the x-axis and TPR on the y-axis. The area under this curve is ROC-AUC.

The ranking interpretation is usually the cleanest: ROC-AUC equals the probability that a randomly chosen positive example receives a higher score than a randomly chosen negative example, with ties handled according to the implementation.

ROC-AUC is threshold-independent and useful for comparing ranking quality, but it can be misleading under heavy class imbalance or when the business cares about a narrow high-precision operating region. In those cases also inspect PR-AUC, precision/recall at a target threshold and calibration.

Типичные ошибки

Confuse ROC-AUC with accuracy.
Forget FPR is FP / all negatives.
Use ROC-AUC alone for rare-event decisions.

Как сказать на собеседовании

State the pairwise probability interpretation.
Mention PR-AUC for imbalanced problems.

6Вопрос9 мин

Spark Broadcast Join и производительность Python UDF

Ответьте без подсказки

Сначала проговорите ответ вслух или тезисами.

Запишите черновик

Формулы, план решения, риски и примеры.

Сравните с разбором

Откройте разбор только после своей попытки.

Открыть отдельную страницу вопроса

Показать разбор

Короткий ответ

Broadcast Join sends a small table to executors so each partition joins locally and avoids expensive shuffle. Python UDFs are slow because Spark must cross the JVM-Python boundary, serialize data and lose many Catalyst/codegen optimizations.

Подробный разбор

Broadcast Join is fast when one side is small enough to fit in executor memory. Spark ships that small side to all executors, and each partition of the large table can join locally. This avoids repartitioning both sides by join key and avoids the network-heavy shuffle path.

Python UDFs can be slow because Spark's execution engine is JVM-based, while the function runs in a Python worker. Rows or batches must be serialized between JVM and Python, Catalyst cannot freely optimize inside the UDF, and code generation/vectorized execution may not apply. Plain row-wise UDFs are especially expensive.

Prefer built-in Spark SQL functions, joins, expressions and window functions. If custom Python logic is unavoidable, consider pandas/vectorized UDFs, Arrow, batch processing, pushing logic upstream, or implementing performance-critical logic in Scala/Java.

Типичные ошибки

Broadcast a table that does not fit in executor memory.
Use Python UDFs for logic expressible in Spark SQL.
Forget serialization and JVM-Python boundary costs.

Как сказать на собеседовании

Say “local join without shuffle” for broadcast.
For UDFs, mention JVM-Python serialization and lost Catalyst optimization.

7Вопрос7 мин

Python dict lookup, декораторы и генераторы

Ответьте без подсказки

Сначала проговорите ответ вслух или тезисами.

Запишите черновик

Формулы, план решения, риски и примеры.

Сравните с разбором

Откройте разбор только после своей попытки.

Открыть отдельную страницу вопроса

Показать разбор

Короткий ответ

Dict lookup is average O(1) through hashing, while list search is O(n). A decorator wraps a function or class to add behavior. A generator lazily produces values, saving memory for streams or large sequences.

Подробный разбор

A Python dict is a hash table. Looking up a key usually hashes the key and probes a small number of slots, so average complexity is O(1). Searching a list checks elements sequentially, so it is O(n).

A decorator is callable syntax for wrapping another function or class. Common uses are caching, logging, timing, authorization, retries and context-like setup around a call. It should preserve metadata with functools.wraps when possible.

A generator is an iterator that yields values lazily. It is useful when the full collection is large, infinite, expensive to compute or naturally streamed. The trade-off is that generators are consumed once and do not provide random access unless materialized.

Типичные ошибки

Say dict lookup is always O(1) without acknowledging collisions/worst case.
Describe decorators as comments or annotations only.
Use a generator when the data must be reused many times.

Как сказать на собеседовании

Answer with complexity first.
Give one concrete decorator use case and one generator use case.

8ЗадачаMedium

Непрерывный подмассив с заданной суммой

Условие

Дан массив положительных целых чисел nums и число target.

Верните индексы начала и конца непрерывного подмассива, сумма элементов которого равна target.

Используйте 0-based индексы и включительные границы. Если подходящих подмассивов несколько, можно вернуть любой. Если подмассива нет, верните пустой список.

Сигнатура

def find_subarray_sum(nums: list[int], target: int) -> list[int]:

Пример

find_subarray_sum([1, 2, 3, 7, 5], 12) -> [1, 3]

Решение прямо на странице

Напишите код, запустите проверки и только потом открывайте разбор.

Проверка решения

Нажмите «Запустить проверки» или Ctrl+Enter.

Показать разбор

Подсказки

Числа положительные
Если увеличить правую границу, сумма не уменьшится. Это позволяет двигать левую границу только вперед.
Границы включительные
Проверьте, что возвращаете индекс последнего элемента окна, а не позицию после него.

Идея решения

Так как числа положительные, при расширении окна сумма только растет. Если сумма стала больше target, двигаем левую границу и вычитаем элементы слева, пока сумма снова не станет допустимой.

Если текущая сумма равна target, возвращаем текущие включительные границы [left, right].

Префиксные суммы с hash map тоже дают O(n) и работают даже для отрицательных чисел, но для положительного массива sliding window проще и требует O(1) памяти.

Эталонный код

def find_subarray_sum(nums: list[int], target: int) -> list[int]:
    left = 0
    current = 0

    for right, value in enumerate(nums):
        current += value

        while left <= right and current > target:
            current -= nums[left]
            left += 1

        if current == target:
            return [left, right]

    return []

Сложность

Время: O(n). Память: O(1).

Так как все числа положительные, достаточно sliding window: правая граница только растет, левая тоже сдвигается вперед.

Открыть задачу в тренажере

9SQL-задачаMedium

Среднее число корзин и товаров по событиям

Условие

Есть таблица событий добавления товаров в корзину.

Нужно за период 2025-09-01 включительно до 2025-10-01 не включительно посчитать две метрики:

avg_baskets_per_user - среднее число разных корзин на пользователя;
avg_items_per_basket - среднее число товаров в корзине.

Учитывайте только события event_type = 'add_to_cart'. Количество товаров в событии хранится в quantity.

Верните одну строку с двумя колонками. Округление не требуется.

Schema

CREATE TABLE cart_events (
  user_id INTEGER NOT NULL,
  basket_id INTEGER NOT NULL,
  item_id INTEGER NOT NULL,
  quantity INTEGER NOT NULL,
  event_type TEXT NOT NULL,
  event_time TEXT NOT NULL
);

Решение прямо на странице

Напишите код, запустите проверки и только потом открывайте разбор.

Проверка решения

Нажмите «Запустить проверки» или Ctrl+Enter.

Показать разбор

Идея решения

Сначала фильтруем только нужный период и события add_to_cart.

Для первой метрики считаем число разных корзин по каждому пользователю и усредняем эти количества.

Для второй метрики сначала считаем сумму quantity по каждой корзине, затем усредняем суммы по корзинам. Это защищает от ошибки, когда среднее считается по строкам событий, а не по корзинам.

Эталонный код

WITH filtered AS (
  SELECT user_id, basket_id, quantity
  FROM cart_events
  WHERE event_type = 'add_to_cart'
    AND event_time >= '2025-09-01'
    AND event_time < '2025-10-01'
),
user_baskets AS (
  SELECT
    user_id,
    COUNT(DISTINCT basket_id) AS basket_count
  FROM filtered
  GROUP BY user_id
),
basket_items AS (
  SELECT
    basket_id,
    SUM(quantity) AS item_count
  FROM filtered
  GROUP BY basket_id
)
SELECT
  AVG(basket_count * 1.0) AS avg_baskets_per_user,
  AVG(item_count * 1.0) AS avg_items_per_basket
FROM user_baskets
CROSS JOIN basket_items;

Сложность

Время: O(n). Память: O(u + b).

Фильтруем события периода, затем агрегируем по пользователям и корзинам. u - число пользователей, b - число корзин.

Открыть задачу в тренажере

10Вопрос8 мин

Безопасный rollout ONNX-модели в production

Как безопасно выкатить новую версию ONNX-модели в production: какие проверки сделать до релиза, как включать трафик, что мониторить и как быстро откатиться?

Ответьте без подсказки

Сначала проговорите ответ вслух или тезисами.

Запишите черновик

Формулы, план решения, риски и примеры.

Сравните с разбором

Откройте разбор только после своей попытки.

Открыть отдельную страницу вопроса

Показать разбор

Короткий ответ

Нужны versioned artifacts, совместимость схем, staging, shadow/canary, health check с реальным inference, мониторинг latency/error/quality и быстрый rollback на предыдущую активную версию.

Подробный разбор

Сначала фиксируем, что именно является релизом: ONNX-файл, preprocessing/postprocessing, schema признаков, версию зависимостей, конфиг порогов и serving image. Все это должно быть версионировано вместе с metadata: дата обучения, данные, offline-метрики, expected input/output schema и совместимость runtime.

До релиза нужны проверки: загрузка модели, inference на контрольных примерах, сравнение с эталонными предсказаниями, проверка типов и shape, representative load test, latency budget и smoke test на staging. Если модель зависит от таблиц или feature store, данные нельзя перезаписывать in-place: лучше публиковать новую версию, валидировать counts/freshness/quality и атомарно переключать active pointer.

В production rollout делаем постепенно: shadow traffic без влияния на пользователя, затем canary на малую долю трафика, затем расширение при нормальных guardrails. Мониторим error rate, p95/p99 latency, timeout, memory/CPU, долю пустых/аномальных ответов, drift входов и бизнесовые proxy-метрики. Rollback должен быть заранее проверенной операцией: вернуть указатель на предыдущую модель или предыдущий сервисный image без ручной пересборки.

Типичные ошибки

Tell a vague story with no detection or prevention.
Only say “we fixed it” without root cause.
Update serving data in place without an atomic swap.

Как сказать на собеседовании

Use the sequence: impact, detection, mitigation, root cause, prevention.
Mention canary, rollback and active-version pointers.