Вопрос
What are the main generation/inference hyperparameters of an LLM and how do they affect output?
Сначала проговорите ответ вслух или тезисами.
Формулы, план решения, риски и примеры.
Откройте разбор только после своей попытки.
Показать разбор
Короткий ответ
Important generation knobs include temperature, top-p/top-k sampling, max tokens, stop sequences, repetition penalties and sometimes beam/search settings. They trade determinism, diversity, latency and safety.
Подробный разбор
Temperature rescales token probabilities. Low temperature makes outputs more deterministic and conservative; high temperature increases diversity and risk of nonsense. Top-p nucleus sampling restricts sampling to the smallest token set whose cumulative probability exceeds p; top-k restricts to k most likely tokens.
Max tokens controls output length and cost. Stop sequences define where generation should terminate. Frequency or presence penalties reduce repetition. For chat/tool systems, additional parameters may control JSON/schema mode, tool-choice behavior and seed/determinism if supported.
In product systems, the right setting depends on task. Extraction and coding assistants usually need low temperature and schema constraints. Brainstorming can tolerate higher diversity. You should evaluate settings on task-specific metrics rather than copy defaults.
Типичные ошибки
- Say temperature is the only important parameter.
- Use high temperature for deterministic extraction.
- Ignore max-token cost and latency.
Вопрос
What main architecture families are used for generative models, and where are they commonly applied?
Сначала проговорите ответ вслух или тезисами.
Формулы, план решения, риски и примеры.
Откройте разбор только после своей попытки.
Показать разбор
Короткий ответ
Common families are autoregressive Transformers for text/code and sequential generation, diffusion models for images/audio/video, GANs for adversarial image generation, and VAEs/flows for latent-variable density modeling.
Подробный разбор
Autoregressive models generate one token or unit at a time conditioned on previous context. Modern LLMs are typically decoder-only Transformers trained with next-token prediction, so this family dominates text, code and many agent workflows.
Diffusion models learn to denoise from noisy samples and are widely used for image, video and audio generation. They usually offer strong sample quality but require multiple denoising steps, though distillation and faster samplers reduce this cost.
GANs train a generator against a discriminator and historically were strong for images, but they can be unstable and harder to control. VAEs and normalizing flows model latent spaces or exact likelihoods and are useful when representation learning or density estimation matters. A good interview answer compares trade-offs, not only names architectures.