Назад к подготовке

Дизайн A/B-теста, размер выборки и p-value

Дизайн A/B-теста, размер выборки и p-value

Ответить самому

Сначала сформулируйте ответ как на собеседовании, затем откройте разбор и оцените себя.

Загрузка

Короткий ответ

Define hypothesis, primary metric, unit, randomization, guardrails, alpha, power and MDE. Sample size depends on baseline variance/rate, desired MDE, alpha, power and traffic. A p-value is the probability, under the null, of observing a result at least this extreme.

Полный разбор

A good A/B setup starts before data is collected. Define the product hypothesis, primary metric, guardrail metrics, experiment unit, randomization scheme, target population, exclusion rules, alpha/significance level, desired power, minimum detectable effect and stopping rule.

Sample size or duration depends on the baseline rate or metric variance, the minimum effect you care about, alpha, desired power and available traffic. Smaller effects, noisier metrics and lower traffic require longer experiments.

For a conversion metric, a two-proportion z-test or equivalent confidence interval is common when sample sizes are large enough. For continuous metrics, a t-test may be appropriate if the unit-level metric and independence assumptions are reasonable; heavy tails or user-level aggregation may require bootstrap, winsorization or a different metric design.

A p-value is not the probability that the null hypothesis is true. It is the probability of seeing data as extreme or more extreme than what you observed, assuming the null is true.

Теория

Experiment design is about controlling false positives, false negatives and product interpretation before looking at results.

Типичные ошибки

  • Choose the sample size after peeking at the result.
  • Define the primary metric after seeing which metric moved.
  • Interpret p=0.03 as a 97% probability that treatment is better.

Как отвечать на собеседовании

  • Say MDE, alpha and power explicitly. Interviewers often wait for those terms.
  • Separate statistical significance from business significance.