Базовые проверки аномалий в sales DataFrame
Базовые проверки аномалий в sales DataFrame
Ответить самому
Сначала сформулируйте ответ как на собеседовании, затем откройте разбор и оцените себя.
Короткий ответ
Check nulls, invalid dates, negative or implausible prices/quantities, duplicate rows, category spelling issues and distribution outliers.
Полный разбор
Start with schema and null checks: required columns, date parsing, missing category, missing price or quantity. Then validate numeric ranges: price and quantity should usually be non-negative, quantities should be plausible, and extreme values should be inspected for unit or extra-zero mistakes.
For time and categorical fields, check unexpected dates, duplicate events, inconsistent category names and very rare categories. For a revenue task, also inspect the distribution of row-level price * quantity, because one bad row can dominate the answer.
The point in an interview is not to list every possible test, but to show that you understand the data contract and the business meaning of each column before trusting aggregate metrics.
Теория
Data analysis tasks need lightweight validation before aggregation because grouped metrics hide row-level errors.
Типичные ошибки
- Aggregate first and only then look for bad records.
- Check only nulls and ignore impossible numeric ranges.
- Treat dates as strings without validating the format.
Как отвечать на собеседовании
- Tie every check to a concrete failure mode.
- Mention both technical validity and business plausibility.