Вопрос про production ML
You review code that loops over texts, calls an embedding model one by one and appends outputs to a NumPy array. What would you improve?
Ответить самому
Сначала сформулируйте ответ как на собеседовании, затем откройте разбор и оцените себя.
Короткий ответ
Avoid repeated np.append, preallocate or collect into a list, batch model inference, handle the last partial batch and derive output shape from the model or first batch when possible.
Полный разбор
Repeated np.append in a loop is usually inefficient because it reallocates and copies arrays. Prefer preallocating the final array if shape is known, or collecting batch outputs in a list and concatenating once.
For model inference, one-by-one calls waste vectorization and accelerator utilization. Send texts in batches, write the returned batch embeddings into the correct slice, and handle the final batch where len(texts) is not divisible by batch_size.
Production review also checks shape assumptions. If embedding dimension is hardcoded, make sure the model contract guarantees it; otherwise infer it from model metadata or the first output. Keep tokenization/model API assumptions explicit and add tests for empty input, one item, non-divisible batch size and output dtype/shape.
Теория
The core production issue is allocation and model-serving efficiency, not only code style.
Типичные ошибки
- Focus only on syntax while missing repeated array reallocation.
- Forget the final partial batch.
- Hardcode embedding dimension without checking model contract.
Как отвечать на собеседовании
- Mention both algorithmic allocation and inference batching.
- Call out the edge case where i + batch_size exceeds array length.