ВопросСредняяllm-inferenceСкрининг · AgentPlace

Вопрос

What are the main generation/inference hyperparameters of an LLM and how do they affect output?

Ответить самому

Сначала сформулируйте ответ как на собеседовании, затем откройте разбор и оцените себя.

Загрузка

Important generation knobs include temperature, top-p/top-k sampling, max tokens, stop sequences, repetition penalties and sometimes beam/search settings. They trade determinism, diversity, latency and safety.

Полный разбор

Temperature rescales token probabilities. Low temperature makes outputs more deterministic and conservative; high temperature increases diversity and risk of nonsense. Top-p nucleus sampling restricts sampling to the smallest token set whose cumulative probability exceeds p; top-k restricts to k most likely tokens. Max tokens controls output length and cost. Stop sequences define where generation should terminate. Frequency or presence penalties reduce repetition. For chat/tool systems, additional parameters may control JSON/schema mode, tool-choice behavior and seed/determinism if supported. In product systems, the right setting depends on task. Extraction and coding assistants usually need low temperature and schema constraints. Brainstorming can tolerate higher diversity. You should evaluate settings on task-specific metrics rather than copy defaults.