Вопрос про production ML
You have about 10,000 statement pages per night, 100 banks, one CPU server and sensitive data that cannot leave the bank. How do you allocate expensive local LLM usage?
Ответить самому
Сначала сформулируйте ответ как на собеседовании, затем откройте разбор и оцените себя.
Короткий ответ
Make the default path cheap and deterministic, estimate throughput, then spend the local LLM budget only on candidate fragments, unknown formats and validation failures.
Полный разбор
First do the arithmetic. A 7B local model on CPU can be minutes per page, so it cannot process every page overnight. The pipeline needs a cheap first pass: PDF text extraction, layout-aware heuristics, regex candidates, known-template parsers and lightweight ML.
Use routing. Pages with known templates and passing validations stay on the cheap path. Unknown templates, failed total checks, suspicious blacklist candidates, or ambiguous numeric runs are routed to the local LLM or human review. The LLM should receive small fragments, not whole thousand-page statements.
Track throughput and backlog as product constraints. If the daily batch must finish before the next banking day, define per-page budgets and graceful degradation. For low-risk statements, return a conservative no-hit result only when cheap checks are strong enough; for high-risk or ambiguous cases, produce a review queue rather than forcing a low-confidence model answer.
Теория
Expensive models should be used as targeted tools inside a budgeted pipeline, not as the default parser.
Типичные ошибки
- Propose a local LLM over every page without throughput math.
- Ignore overnight batch deadlines.
- Send too much context to the LLM.
- Return confident verdicts for low-confidence extraction failures.
Как отвечать на собеседовании
- Estimate the page budget out loud.
- Use “route only hard fragments to LLM” as the core design.