Высокоточная модерация фото авто при редком фроде
Нужно автоматически отклонять объявления, когда признаки с фото авто противоречат введенным пользователем атрибутам. Фрод редкий, а ложные отклонения бьют по пользователям. Как обучать модель, валидировать качество и выбирать пороги?
Ответить самому
Сначала сформулируйте ответ как на собеседовании, затем откройте разбор и оцените себя.
Короткий ответ
Treat this as a high-precision production classifier: use reliable clean slices, proxy fraud evidence, manual review, business-owned FPR limits, online unblock/appeal monitoring and conservative thresholds by class.
Полный разбор
Start from the decision cost. An auto-reject should happen only when the model is very confident because false positives block legitimate sellers. Agree an acceptable false positive rate with the business, then maximize recall under that constraint rather than optimizing generic accuracy.
For training, use historical listings with user-entered attributes and photo-derived labels, but do not assume all history is clean. Build a high-confidence clean slice from ownership documents or other trusted signals, compare it with random traffic, and manually inspect model triggers. Rare classes should have separate thresholds or be excluded from auto-reject until there is enough evidence.
For validation and launch, track precision on reviewed triggers, unblock or appeal rate, trigger volume by brand/model/color, segment drift and business outcomes. Roll out gradually, keep manual-review fallback for low-confidence cases, and monitor whether the trigger distribution changes after launch.
Теория
Rare-positive moderation systems should be optimized as constrained decision systems, not as ordinary balanced classification tasks.
Типичные ошибки
- Optimize ROC-AUC or accuracy while ignoring the cost of false rejects.
- Assume historical user-provided attributes are perfect labels.
- Use one global threshold for all classes, including rare ones.
- Launch without appeal or unblock feedback as a precision proxy.
Как отвечать на собеседовании
- State the business-owned FPR constraint before discussing the model.
- Name one offline proxy and one online monitoring signal.