Пройти собеседование: Okko: Техническое собеседование

1Вопрос10 мин

Вопрос

Explain what the GIL is, why CPython has it, and what happens at a high level when you run a Python file.

Ответьте без подсказки

Сначала проговорите ответ вслух или тезисами.

Запишите черновик

Формулы, план решения, риски и примеры.

Сравните с разбором

Откройте разбор только после своей попытки.

Открыть отдельную страницу вопроса

Показать разбор

Короткий ответ

In CPython, the GIL is a process-level lock that lets only one thread execute Python bytecode at a time. Source is parsed and compiled to bytecode, then the VM interprets bytecode and calls C-implemented runtime functions where relevant.

Подробный разбор

The GIL is the Global Interpreter Lock in CPython. It protects interpreter internals, especially reference counting and object memory-management invariants, by allowing only one thread to execute Python bytecode at a time inside a process.

This does not mean threads are useless. Threads can help for I/O-bound work because the thread can release control while waiting on network, disk or other external resources. For CPU-bound Python bytecode, multiple threads do not run Python code in parallel in one CPython process; use multiprocessing, native extensions that release the GIL, vectorized libraries or another runtime when CPU parallelism is needed.

When a Python file is executed, CPython reads source, parses it, compiles it to bytecode, then the bytecode evaluation loop executes instructions. Built-ins and many runtime operations eventually call C implementations, but the user-level Python program is not compiled directly to native machine code in the usual CPython path.

Типичные ошибки

Say Python has no compilation at all.
Say the GIL exists only to prevent user-level race conditions.
Claim Python threads are never useful.

Как сказать на собеседовании

Separate CPU-bound and I/O-bound threading.
Mention bytecode compilation before interpretation.

2Вопрос8 мин

Вопрос

How are arguments passed to functions in Python? What happens if a function mutates a list argument versus reassigning an immutable value?

Ответьте без подсказки

Сначала проговорите ответ вслух или тезисами.

Запишите черновик

Формулы, план решения, риски и примеры.

Сравните с разбором

Откройте разбор только после своей попытки.

Открыть отдельную страницу вопроса

Показать разбор

Короткий ответ

Python passes object references by assignment. A function receives a new local name bound to the same object. Mutating a mutable object is visible outside; rebinding the local name is not.

Подробный разбор

A precise way to say it is "call by object sharing" or "pass by assignment". The function parameter becomes a local name bound to the same object as the argument expression.

If the object is mutable, such as a list, an in-place mutation like append changes that shared object, so the caller observes the change. If the function reassigns the parameter name, it only changes the local binding. If the object is immutable, such as an int or tuple, operations that look like modification actually create a new object or fail, so the caller's original object is not modified.

This is neither C-style pass-by-value of the full object nor pass-by-reference where assignment to the parameter changes the caller's variable binding.

Типичные ошибки

Say mutable objects are passed by reference and immutable objects by value.
Confuse mutating an object with rebinding a local variable.

Как сказать на собеседовании

Use a list append example and an integer reassignment example.
Avoid overloaded C++ terminology unless you define it carefully.

3ЗадачаEasy

FastStatistics: min, max и среднее за O(1)

Условие

Implement a streaming statistics helper with insert, get_min, get_max and get_avg.

Each operation should be as fast as possible. Handle queries before any insert explicitly.

Решение прямо на странице

Напишите код, запустите проверки и только потом открывайте разбор.

Проверка решения

Нажмите «Запустить проверки» или Ctrl+Enter.

Показать разбор

Подсказки

Не пересканируйте значения
Храните min, max, сумму и количество как состояние.
Пустое состояние
Нужно явно решить, что возвращают запросы до первого insert. Здесь это None.

Идея решения

Храним четыре значения: текущий минимум, текущий максимум, сумму и количество элементов. insert обновляет их за O(1). Запросы читают сохранённые значения за O(1).

Поведение пустого состояния нужно задать явно. На собеседовании это всплывало как граничный случай для get_min и get_max; в тренажёре все запросы до первого insert возвращают None.

Эталонный код

def fast_statistics(operations: list[list]) -> list[float | int | None]:
    current_min = None
    current_max = None
    total = 0
    count = 0
    result = []

    for operation in operations:
        name = operation[0]
        if name == "insert":
            value = operation[1]
            current_min = value if current_min is None else min(current_min, value)
            current_max = value if current_max is None else max(current_max, value)
            total += value
            count += 1
        elif name == "min":
            result.append(current_min)
        elif name == "max":
            result.append(current_max)
        elif name == "avg":
            result.append(None if count == 0 else total / count)

    return result

Сложность

Время: O(n). Память: O(1).

Каждая операция обрабатывается один раз. В состоянии хранятся только min, max, сумма и количество.

Открыть задачу в тренажере

4Вопрос10 мин

Вопрос про production ML

You review code that loops over texts, calls an embedding model one by one and appends outputs to a NumPy array. What would you improve?

Ответьте без подсказки

Сначала проговорите ответ вслух или тезисами.

Запишите черновик

Формулы, план решения, риски и примеры.

Сравните с разбором

Откройте разбор только после своей попытки.

Открыть отдельную страницу вопроса

Показать разбор

Короткий ответ

Avoid repeated np.append, preallocate or collect into a list, batch model inference, handle the last partial batch and derive output shape from the model or first batch when possible.

Подробный разбор

Repeated np.append in a loop is usually inefficient because it reallocates and copies arrays. Prefer preallocating the final array if shape is known, or collecting batch outputs in a list and concatenating once.

For model inference, one-by-one calls waste vectorization and accelerator utilization. Send texts in batches, write the returned batch embeddings into the correct slice, and handle the final batch where len(texts) is not divisible by batch_size.

Production review also checks shape assumptions. If embedding dimension is hardcoded, make sure the model contract guarantees it; otherwise infer it from model metadata or the first output. Keep tokenization/model API assumptions explicit and add tests for empty input, one item, non-divisible batch size and output dtype/shape.

Типичные ошибки

Focus only on syntax while missing repeated array reallocation.
Forget the final partial batch.
Hardcode embedding dimension without checking model contract.

Как сказать на собеседовании

Mention both algorithmic allocation and inference batching.
Call out the edge case where i + batch_size exceeds array length.

5SQL-задачаEasy

Предыдущий просмотренный фильм через SQL LAG

Условие

Given watch history, return each event with the previous movie watched by the same user.

Use a window function ordered by timestamp.

Решение прямо на странице

Напишите код, запустите проверки и только потом открывайте разбор.

Проверка решения

Нажмите «Запустить проверки» или Ctrl+Enter.

Показать разбор

Идея решения

Используем LAG(movie_id) по партиции user_id с сортировкой по watched_at. У первого события каждого пользователя предыдущего фильма нет, поэтому SQL возвращает NULL.

Эталонный код

SELECT
  user_id,
  movie_id,
  LAG(movie_id) OVER (
    PARTITION BY user_id
    ORDER BY watched_at
  ) AS previous_movie_id
FROM watch_history
ORDER BY user_id, watched_at;

Сложность

Время: O(n log n). Память: O(n).

База сортирует события по пользователю и времени для оконной функции.

Открыть задачу в тренажере

6SQL-задачаMedium

Precision@2 и Recall@2 для рекомендаций фильмов

Условие

Given watched movies and ranked recommendations, compute per-user Precision@K and Recall@K.

Handle duplicate watched movies and define the denominator for recall carefully.

Решение прямо на странице

Напишите код, запустите проверки и только потом открывайте разбор.

Проверка решения

Нажмите «Запустить проверки» или Ctrl+Enter.

Показать разбор

Идея решения

Сначала дедуплицируем релевантные просмотры. Затем ранжируем рекомендации по пользователю и оставляем только топ-2. LEFT JOIN топ-рекомендаций с уникальными просмотрами даёт флаг попадания.

Precision@2 — это число попаданий, делённое на 2. Recall@2 — число попаданий, делённое на количество уникальных релевантных фильмов пользователя. Пользователей без релевантных фильмов в тестовых данных нет; в продакшене для них нужно отдельно зафиксировать политику.

Эталонный код

WITH distinct_views AS (
  SELECT DISTINCT user_id, movie_id
  FROM views
),
ranked_recommendations AS (
  SELECT
    user_id,
    movie_id,
    ROW_NUMBER() OVER (
      PARTITION BY user_id
      ORDER BY rank ASC, movie_id ASC
    ) AS rn
  FROM recommendations
),
top_recommendations AS (
  SELECT user_id, movie_id
  FROM ranked_recommendations
  WHERE rn <= 2
),
hits AS (
  SELECT
    tr.user_id,
    SUM(CASE WHEN dv.movie_id IS NULL THEN 0 ELSE 1 END) AS hit_count
  FROM top_recommendations tr
  LEFT JOIN distinct_views dv
    ON dv.user_id = tr.user_id
   AND dv.movie_id = tr.movie_id
  GROUP BY tr.user_id
),
relevant_totals AS (
  SELECT user_id, COUNT(*) AS relevant_count
  FROM distinct_views
  GROUP BY user_id
)
SELECT
  h.user_id,
  1.0 * h.hit_count / 2 AS precision_at_2,
  1.0 * h.hit_count / rt.relevant_count AS recall_at_2
FROM hits h
JOIN relevant_totals rt
  ON rt.user_id = h.user_id
ORDER BY h.user_id;

Сложность

Время: O(r log r + v). Память: O(r + v).

Самый дорогой шаг — ранжирование рекомендаций внутри пользователя; join и агрегации сканируют релевантные строки.

Открыть задачу в тренажере