ML System Design
A bank asks a suspicious legal entity for PDF statements from other banks. Design how ML can extract compliance value from those statements.
Ответить самому
Сначала сформулируйте ответ как на собеседовании, затем откройте разбор и оцените себя.
Короткий ответ
Frame the task as document-to-risk evidence: parse statements, extract counterparties and payments, compare against blacklists and observed activity, then produce interpretable signals for compliance decisions.
Полный разбор
Start by clarifying the decision. The bank is not merely classifying a PDF; it needs evidence for whether a legal entity should remain serviced, whether more documents are needed, or whether the case should escalate to compliance.
Useful outputs include counterparties, INNs, payment purposes, transaction amounts, dates, turnover shares, suspicious counterparties, activity categories and differences between activity in the external bank and activity observed internally. The system should produce structured evidence and confidence, not only a binary verdict.
Constrain the first version. Assume text PDFs rather than scanned images, legal entities rather than individuals, and honest documents if that is the interviewer's scope. Then design a pipeline with extraction, validation, risk aggregation, human review and monitoring because compliance needs traceability and low false-negative risk.
Теория
MLSD cases in regulated domains should begin with the decision, evidence and constraints before model choice.
Типичные ошибки
- Jump straight to an LLM classifier over the whole PDF.
- Ignore legal-entity specificity and mandatory fields.
- Return a black-box risk score without evidence.
- Forget that the regulator and the bank can both make errors.
Как отвечать на собеседовании
- Clarify whether the PDF is text or scanned.
- Say that the output must be interpretable for compliance managers.