Пройти собеседование: Diagnocat: Техническое собеседование

1Вопрос15 мин

Разбор training loop на PyTorch для многоклассовой классификации

Ответьте без подсказки

Сначала проговорите ответ вслух или тезисами.

Запишите черновик

Формулы, план решения, риски и примеры.

Сравните с разбором

Откройте разбор только после своей попытки.

Открыть отдельную страницу вопроса

Показать разбор

Короткий ответ

Check architecture, batching, labels, optimizer step/zero_grad, train/eval loops, loss inputs, device transfer, validation, batch size, epochs and metrics.

Подробный разбор

A code-review answer should separate correctness from style. Correctness issues include whether Dataset returns objects that the default collate function can batch, whether the model receives tensors rather than custom records, whether labels are passed to the loss, and whether the optimizer is created, zeroed, stepped and tied to model.parameters().

For multiclass classification with CrossEntropyLoss, the model should normally return raw logits, not softmax probabilities, because the loss applies log-softmax internally. Labels should be class indices with the right dtype and range. The training loop should run for multiple epochs, set model.train(), move the batch to the target device in the loop, and have a separate validation loop with model.eval() and no_grad().

Design issues include a weak linear-only image architecture, missing batch_size, no metrics/logging, no validation split, no seed/reproducibility strategy and unclear input-shape assumptions.

Типичные ошибки

Apply softmax before CrossEntropyLoss.
Forget labels or pass images as targets.
Move data to GPU inside Dataset workers.

Как сказать на собеседовании

Start with runtime-breaking bugs before architecture opinions.
Mention the raw-logits contract for CrossEntropyLoss.

2Вопрос10 мин

Вопрос про production ML

In PyTorch, what should Dataset do, what should collate_fn do, how do num_workers affect this, and where should .to(device) usually happen?

Ответьте без подсказки

Сначала проговорите ответ вслух или тезисами.

Запишите черновик

Формулы, план решения, риски и примеры.

Сравните с разбором

Откройте разбор только после своей попытки.

Открыть отдельную страницу вопроса

Показать разбор

Короткий ответ

Dataset maps indices to CPU examples; collate_fn batches fetched examples; num_workers parallelize fetching/collation. Device transfer usually belongs in the training loop, after batching.

Подробный разбор

A Dataset should expose __len__ and __getitem__, and __getitem__ should return one example. It can read files and apply CPU transforms, but it should not usually know about training state or GPU device placement.

collate_fn receives a list of already-fetched examples and turns them into a batch. It is useful when examples are custom classes, variable-length sequences or nested structures that the default collate cannot stack. It should not fetch indices itself; that would mix responsibilities.

num_workers creates worker processes for data loading. If __getitem__ moves tensors to GPU, multiple workers can compete for GPU memory, prefetch GPU batches and make device ownership messy. The common pattern is CPU Dataset and collate, then in the train loop do batch = batch.to(device), optionally with pinned memory and non_blocking transfers.

Типичные ошибки

Make collate_fn fetch from the dataset by index.
Move samples to CUDA inside Dataset.__getitem__.
Forget __len__ for map-style datasets.

Как сказать на собеседовании

Use separation of responsibilities as the organizing principle.
Mention num_workers and prefetching when explaining device placement.

3Вопрос10 мин

Вопрос

Why does a custom nn.Module need super().__init__()? Separately, why is tags=[] as a default argument in Python dangerous?

Ответьте без подсказки

Сначала проговорите ответ вслух или тезисами.

Запишите черновик

Формулы, план решения, риски и примеры.

Сравните с разбором

Откройте разбор только после своей попытки.

Открыть отдельную страницу вопроса

Показать разбор

Короткий ответ

nn.Module.__init__ initializes internal registries for parameters, buffers and submodules. A mutable default list is shared across calls, so mutations leak between instances.

Подробный разбор

A custom PyTorch module should call super().__init__() before assigning submodules. nn.Module.__init__ creates internal dictionaries and hooks used to register parameters, buffers and child modules. Without it, methods may exist through inheritance, but assigning Linear layers and later calling parameters(), state_dict(), train(), eval() or to(device) can break or behave incorrectly.

The Python default argument issue is separate. Default values are evaluated once when the function is defined, not each time it is called. If tags=[] is used and one call mutates that list, later calls without tags see the same mutated object. The usual pattern is tags: list[str] | None = None, then inside the function create a new empty list when tags is None.

The common theme is object lifecycle: initialization matters for framework objects, and defaults must not hide shared mutable state.

Типичные ошибки

Think inheritance alone initializes nn.Module internals.
Use [] or {} as default arguments.
Use `tags or []` when an explicitly empty list has semantic meaning.

Как сказать на собеседовании

Mention parameter registration for super().__init__.
Give a tiny example of two instances sharing the same default list.