Пройти собеседование: Quantum One: Техническое собеседование

1Вопрос10 мин

ML System Design

You need to forecast how long a ship will wait at a port. How would you define the ML target, time granularity and prediction horizon so the result is useful for operations?

Ответьте без подсказки

Сначала проговорите ответ вслух или тезисами.

Запишите черновик

Формулы, план решения, риски и примеры.

Сравните с разбором

Откройте разбор только после своей попытки.

Открыть отдельную страницу вопроса

Показать разбор

Короткий ответ

Define a target that matches the operational decision: expected waiting time for a ship/port/time window, with a horizon short enough to affect routing or speed decisions and a granularity that does not hide actionable changes.

Подробный разбор

Start from the decision. If the business wants to slow down, speed up, reroute or plan port arrival, the target should be expected waiting time or delay at the relevant port for a specific arrival window. A daily aggregate may be stable for modeling, but it can be too slow for operations if the ship can react within hours.

A good framing separates prediction time, planned arrival time and actual service start time. The label can be waiting_time = service_start - arrival, clipped or transformed if there are extreme tails. The model should only use features known at prediction time, such as current queue state, historical seasonality, port attributes, ship type, schedule and recent congestion indicators.

Choose granularity by trade-off: hourly or multi-hour windows provide actionable control but noisier labels; daily windows are easier but may miss operational value. In an interview, explicitly state this trade-off and propose validating several horizons against an operations metric, not only offline RMSE.

Типичные ошибки

Use a daily average only because it is easier to model.
Mix features from after arrival into the training row.
Optimize RMSE without connecting prediction latency to ship operations.

2Вопрос10 мин

ML System Design

For a port waiting-time model, what features would you build beyond timestamp features, and how would you detect anomalies or broken tracking data?

Ответьте без подсказки

Сначала проговорите ответ вслух или тезисами.

Запишите черновик

Формулы, план решения, риски и примеры.

Сравните с разбором

Откройте разбор только после своей попытки.

Открыть отдельную страницу вопроса

Показать разбор

Короткий ответ

Use port, ship and queue-state features plus historical congestion aggregates computed without leakage. Data-quality checks should catch impossible timestamps, inconsistent event order, extreme waits and distribution shifts.

Подробный разбор

Useful features include port identity, berth/terminal type, ship class, cargo type if available, planned arrival slot, day-of-week, seasonality, weather, recent queue length, recent average service time, number of ships currently waiting and historical congestion for the same port and time bucket.

For anomaly detection, first add rule-based checks: negative waiting time, impossible event order, duplicated events, huge jumps, missing departure/arrival events and inconsistent timezone handling. Then add statistical checks over target and feature distributions: robust z-scores, percentile caps, isolation-style outlier detection or per-port control charts.

In a production system, anomaly handling should be explicit. Some anomalies are data errors and should be fixed or removed; some are rare but real disruptions and should be modeled or flagged. Keep an audit table so that the model does not silently learn from corrupted event tracking.

Типичные ошибки

Use only timestamp features and ignore port/ship context.
Treat every outlier as an error and delete rare real disruptions.
Compute historical aggregates using future data.

3Вопрос10 мин

ML System Design

You have a categorical feature such as port_id. Compare one-hot encoding with historical target aggregates for tree models, and explain the leakage risks.

Ответьте без подсказки

Сначала проговорите ответ вслух или тезисами.

Запишите черновик

Формулы, план решения, риски и примеры.

Сравните с разбором

Откройте разбор только после своей попытки.

Открыть отдельную страницу вопроса

Показать разбор

Короткий ответ

One-hot encoding is leakage-safe but can be sparse. Target/statistical aggregates can be powerful, but they must be computed using only past or out-of-fold data; otherwise they leak label information.

Подробный разбор

One-hot encoding port_id gives the model a binary split per port. For tree models it can work, but high cardinality makes splits sparse and may not capture port similarity or historical congestion strength well. It is still a safe baseline because the encoding itself does not use the label.

Historical aggregates such as mean waiting time per port, recent queue length, rolling median delay or per-port seasonality are often stronger. The danger is leakage: if you compute the aggregate over the whole dataset, the row’s own target and future rows influence the feature. Offline validation will look better than real production.

Use time-aware aggregation or out-of-fold target encoding. For a prediction at time t, the feature must be computed from records before t. For cross-validation, build folds in time order or calculate encodings inside each training fold only. Add smoothing for rare ports so the feature does not overfit low-count categories.

Типичные ошибки

Compute target mean per category on the full dataset.
Assume one-hot columns are always ignored by tree models.
Forget rare-category smoothing and unknown-port handling.

4Вопрос12 мин

ML System Design

How can a forecasting system support multiple prediction horizons, and what does it mean that SHAP is model-agnostic?

Ответьте без подсказки

Сначала проговорите ответ вслух или тезисами.

Запишите черновик

Формулы, план решения, риски и примеры.

Сравните с разбором

Откройте разбор только после своей попытки.

Открыть отдельную страницу вопроса

Показать разбор

Короткий ответ

Multiple horizons can be handled by separate models, a multi-head model, direct multi-output prediction, or recursive forecasting. SHAP estimates feature contributions from changes in predictions under feature coalitions, so it can wrap many model classes.

Подробный разбор

For multiple horizons, avoid forcing one fixed label if the business needs predictions for 1 hour, 6 hours and several days. Options include training one model per horizon, training a multi-output model with one head per horizon, or recursively feeding shorter-horizon predictions into longer-horizon forecasts. Direct multi-horizon models are usually easier to validate than recursive approaches because errors do not compound as silently.

Choose the design based on data volume, latency and consistency needs. Separate models are simple and debuggable; multi-head models share representations and can exploit related horizons; recursive models are flexible but vulnerable to accumulated error. Metrics should be reported per horizon because short-term and long-term accuracy have different product value.

SHAP is called model-agnostic in the KernelSHAP sense because it can treat the model as a black box and query predictions under different subsets of features. It estimates each feature’s marginal contribution averaged over coalitions. TreeSHAP is a faster model-specific variant for tree ensembles, but the conceptual idea is still attribution from prediction changes.

Типичные ошибки

Use one horizon because it is convenient and ignore downstream decisions.
Describe SHAP as reading model weights directly.
Report one aggregate metric across all horizons.