Пройти собеседование: Wheely: Техническое собеседование

1ЗадачаHard

Логистическая регрессия с нуля на транспонированных признаках

Условие

Implement binary logistic regression from scratch for a ride-hailing trip-acceptance classifier.

The interviewer's matrix layout is transposed: features[j][i] is feature j for sample i. Train with gradient descent and return predictions for query columns.

Решение прямо на странице

Напишите код, запустите проверки и только потом открывайте разбор.

Проверка решения

Нажмите «Запустить проверки» или Ctrl+Enter.

Показать разбор

Подсказки

Примеры лежат в колонках
Для примера i признак j — это features[j][i]. Это основная ловушка с формой данных.
Используйте градиент логистической регрессии
Для binary cross-entropy с sigmoid ошибка на примере — это probability - target.
Храните bias отдельно
Свободный член позволяет сдвигать решающую границу относительно начала координат.

Идея решения

Главная ловушка задачи — ориентация матрицы. Примеры лежат в колонках, а не в строках. Пример i собирается как features[0][i], features[1][i] и так далее.

Обучаем веса w и bias b, минимизируя binary cross-entropy. Для логистической регрессии градиент по одному примеру равен (sigmoid(w*x+b) - y) * x_j для каждого признака и sigmoid(w*x+b) - y для bias.

После обучения считаем вероятности для каждой query-колонки и применяем порог 0.5. Reference solution ограничивает вход sigmoid только для защиты от численного overflow; это не меняет алгоритм.

Эталонный код

def predict_trip_acceptance(
    features: list[list[float]],
    targets: list[int],
    query_features: list[list[float]],
    learning_rate: float = 0.5,
    steps: int = 5000,
) -> list[int]:
    import math

    if not features:
        return []

    feature_count = len(features)
    sample_count = len(targets)
    weights = [0.0] * feature_count
    bias = 0.0

    def sigmoid(value: float) -> float:
        if value < -40:
            return 0.0
        if value > 40:
            return 1.0
        return 1.0 / (1.0 + math.exp(-value))

    for _ in range(steps):
        grad_weights = [0.0] * feature_count
        grad_bias = 0.0

        for sample_index in range(sample_count):
            linear = bias
            for feature_index in range(feature_count):
                linear += weights[feature_index] * features[feature_index][sample_index]

            error = sigmoid(linear) - targets[sample_index]
            grad_bias += error
            for feature_index in range(feature_count):
                grad_weights[feature_index] += error * features[feature_index][sample_index]

        bias -= learning_rate * grad_bias / sample_count
        for feature_index in range(feature_count):
            weights[feature_index] -= learning_rate * grad_weights[feature_index] / sample_count

    query_count = len(query_features[0]) if query_features else 0
    predictions: list[int] = []
    for query_index in range(query_count):
        linear = bias
        for feature_index in range(feature_count):
            linear += weights[feature_index] * query_features[feature_index][query_index]
        predictions.append(1 if sigmoid(linear) >= 0.5 else 0)

    return predictions

Сложность

Время: O(steps * n * d + m * d). Память: O(d).

Каждый шаг gradient descent проходит по n обучающим примерам и d признакам. Предсказание проходит по m query-примерам и d признакам.

Открыть задачу в тренажере

2Вопрос8 мин

Почему нулевая инициализация ломает нейросети

Ответьте без подсказки

Сначала проговорите ответ вслух или тезисами.

Запишите черновик

Формулы, план решения, риски и примеры.

Сравните с разбором

Откройте разбор только после своей попытки.

Открыть отдельную страницу вопроса

Показать разбор

Короткий ответ

Zero initialization makes hidden units in the same layer identical: they receive the same gradients and learn the same features. Logistic regression has one linear unit, so this symmetry issue does not occur there.

Подробный разбор

In a multilayer neural network, if all weights in a hidden layer start equal, then neurons in that layer compute the same values. During backpropagation they also receive the same gradients and remain identical after the update. The layer behaves as if it had one neuron copied many times, so model capacity is wasted.

Random initialization breaks this symmetry. Xavier/Glorot, He and related initializations also scale variance so activations and gradients do not explode or vanish too quickly across layers.

Logistic regression is different because there is no group of interchangeable hidden neurons that need to specialize. Starting logistic regression weights at zero is usually fine for convex optimization; the gradient still points toward a useful solution.

Типичные ошибки

Say zero initialization always prevents any gradient.
Ignore the distinction between logistic regression and multilayer networks.
Mention random initialization without explaining symmetry breaking.

Как сказать на собеседовании

Use the phrase "same activation, same gradient, same update" for hidden neurons.
Then contrast with convex logistic regression.

3Вопрос10 мин

Метрики фрод-классификатора при асимметричных ошибках

Ответьте без подсказки

Сначала проговорите ответ вслух или тезисами.

Запишите черновик

Формулы, план решения, риски и примеры.

Сравните с разбором

Откройте разбор только после своей попытки.

Открыть отдельную страницу вопроса

Показать разбор

Короткий ответ

Use ranking metrics such as PR-AUC/ROC-AUC for model comparison, but choose the operating threshold by business cost, capacity and required precision/recall. With rare fraud, PR-AUC and precision at review capacity are often more informative than accuracy.

Подробный разбор

Fraud is usually imbalanced, so accuracy is a weak metric. A model can look accurate by predicting "not fraud" almost everywhere. For model comparison, use ROC-AUC if class balance is moderate, and PR-AUC, precision@k, recall@k or lift when positives are rare.

The production threshold should be chosen from the business trade-off. A false positive may block or review a good user; a false negative may let fraud through. If the approximate costs are known, choose the threshold that minimizes expected cost:

expected cost = FP * cost_fp + FN * cost_fn.

If there is a manual review team, use capacity constraints such as top-k alerts per day and track precision at that capacity. If regulation or user experience imposes limits, add guardrails such as maximum false-positive rate for trusted users.

Always validate on a split that matches deployment time and population. Fraud patterns shift, so threshold and calibration should be monitored after launch.

Типичные ошибки

Optimize accuracy on an imbalanced fraud dataset.
Pick ROC-AUC only and never discuss the operating threshold.
Ignore the cost difference between blocking good users and missing fraud.

Как сказать на собеседовании

Ask what action follows the score: manual review, block, or soft friction.
Mention PR-AUC or precision@k when fraud is rare.

4Вопрос10 мин

Разбиение данных и утечки в фрод-модели

Ответьте без подсказки

Сначала проговорите ответ вслух или тезисами.

Запишите черновик

Формулы, план решения, риски и примеры.

Сравните с разбором

Откройте разбор только после своей попытки.

Открыть отдельную страницу вопроса

Показать разбор

Короткий ответ

Prefer a time-based split that mimics deployment. Check leakage from future aggregates, labels or actions created after the prediction time, duplicate users/entities across splits, and identifiers that proxy the target too directly.

Подробный разбор

For fraud and other time-dependent systems, a random split often overestimates quality. The model will be used on future events, so validation should usually train on earlier time periods and validate on later periods. If the product has repeated users, merchants or devices, consider grouped or entity-aware splits as an additional stress test.

Leakage checks should follow the prediction timestamp. Any feature must be available at decision time. Common leaks include aggregates computed over the full dataset, chargeback labels or moderation actions that happen after the event, future user behavior, target-derived flags, and entity IDs that memorize repeat offenders across random splits.

A good validation report includes temporal holdout performance, calibration/threshold behavior, segment metrics and drift monitoring. If fraud patterns change quickly, keep a recent validation window and backtest across several time periods.

Типичные ошибки

Use a random split for a temporal fraud problem without checking leakage.
Build aggregate features before splitting.
Let the same fraudulent entity appear in both train and validation in a way that cannot happen at launch.

Как сказать на собеседовании

Anchor every feature to "available at prediction time".
Give concrete leakage examples, not only the word leakage.