Мониторинг drift данных и реакция с переобучением
Мониторинг drift данных и реакция с переобучением
Ответить самому
Сначала сформулируйте ответ как на собеседовании, затем откройте разбор и оцените себя.
Короткий ответ
Data drift is a shift in input or target-related distributions. Monitor feature, prediction and business metrics; react with investigation, retraining, threshold changes or fallback rules.
Полный разбор
Data drift means the production data no longer follows the same distribution as the data used to train or validate the model. It can be feature drift, label drift, concept drift, seasonality, new user segments or a pipeline bug.
Detection should combine several layers. Track feature distributions, missing rates, categorical cardinalities, prediction-score distributions, calibration, latency and business metrics. If labels arrive later, compare delayed quality metrics against historical baselines. Statistical tests can help, but dashboarded trend changes and alert thresholds are often more actionable.
The response depends on cause and severity. You may retrain on fresh data, refresh calibration or thresholds, fix a broken data pipeline, add monitoring for a new segment, roll back a model or use a simpler fallback while collecting labels.
Теория
Drift is a production feedback problem: detect distribution changes, connect them to model quality and choose a safe response.
Типичные ошибки
- Assume scheduled retraining alone solves drift.
- Monitor only model loss while ignoring feature pipeline changes.
- React to drift without first checking data quality bugs.
Как отвечать на собеседовании
- Separate feature drift, concept drift and pipeline breakage.
- Mention delayed labels and business metrics.