Назад к подготовке

ML System Design

You need to forecast how long a ship will wait at a port. How would you define the ML target, time granularity and prediction horizon so the result is useful for operations?

Ответить самому

Сначала сформулируйте ответ как на собеседовании, затем откройте разбор и оцените себя.

Загрузка

Короткий ответ

Define a target that matches the operational decision: expected waiting time for a ship/port/time window, with a horizon short enough to affect routing or speed decisions and a granularity that does not hide actionable changes.

Полный разбор

Start from the decision. If the business wants to slow down, speed up, reroute or plan port arrival, the target should be expected waiting time or delay at the relevant port for a specific arrival window. A daily aggregate may be stable for modeling, but it can be too slow for operations if the ship can react within hours.

A good framing separates prediction time, planned arrival time and actual service start time. The label can be waiting_time = service_start - arrival, clipped or transformed if there are extreme tails. The model should only use features known at prediction time, such as current queue state, historical seasonality, port attributes, ship type, schedule and recent congestion indicators.

Choose granularity by trade-off: hourly or multi-hour windows provide actionable control but noisier labels; daily windows are easier but may miss operational value. In an interview, explicitly state this trade-off and propose validating several horizons against an operations metric, not only offline RMSE.

Теория

Target design is part of system design. The right label is the one that helps the downstream decision, not necessarily the one with the smoothest historical data.

Типичные ошибки

  • Use a daily average only because it is easier to model.
  • Mix features from after arrival into the training row.
  • Optimize RMSE without connecting prediction latency to ship operations.