Вопрос
What main architecture families are used for generative models, and where are they commonly applied?
Ответить самому
Сначала сформулируйте ответ как на собеседовании, затем откройте разбор и оцените себя.
Короткий ответ
Common families are autoregressive Transformers for text/code and sequential generation, diffusion models for images/audio/video, GANs for adversarial image generation, and VAEs/flows for latent-variable density modeling.
Полный разбор
Autoregressive models generate one token or unit at a time conditioned on previous context. Modern LLMs are typically decoder-only Transformers trained with next-token prediction, so this family dominates text, code and many agent workflows.
Diffusion models learn to denoise from noisy samples and are widely used for image, video and audio generation. They usually offer strong sample quality but require multiple denoising steps, though distillation and faster samplers reduce this cost.
GANs train a generator against a discriminator and historically were strong for images, but they can be unstable and harder to control. VAEs and normalizing flows model latent spaces or exact likelihoods and are useful when representation learning or density estimation matters. A good interview answer compares trade-offs, not only names architectures.