A/B Test Sample Size Calculator

Baseline conversion rate (%)

Your current rate (e.g. 5% = 5 conversions per 100 visitors).

Minimum detectable effect (relative %)

Smallest improvement worth detecting (10% means 5% → 5.5%).

Significance level α (%)

Probability of a false positive (typically 5%).

Statistical power (%)

Probability of detecting a real effect (typically 80%).

Daily visitors to test page

Sample per variant

31,234

Total sample

62,468

Days to significance

at 1,000 daily visitors

Scenario

At 5% baseline conversion, detecting a 10% relative improvement (i.e. 5.00% → 5.50%) at 95% significance with 80% power needs 31,234 visitors in each variant, or 62,468 total.

About the A/B Test Sample Size Calculator

Running an A/B test without computing sample size first is how teams end up with 'inconclusive' results that waste months. This calculator uses the exact formula for two-proportion z-tests to tell you upfront: how many visitors per variant, and how long you'll wait. Fix your MDE first, plan the test, then ship.

Features

Inputs: baseline rate, minimum detectable effect, significance level (α), power, daily traffic
Outputs: per-variant sample, total sample, days to significance
Uses inverse normal CDF for exact z-score derivation
Plain-English scenario summary

How it works

Enter your current (baseline) conversion rate.
Enter the minimum lift you'd care about (MDE).
Keep default significance (5%) and power (80%) or adjust.
Enter your daily visitors to see days-to-significance.

Use cases

Pre-test planning
Growth marketing
Product experimentation
Conversion optimization
Statistical power analysis

Frequently asked questions

Why is my sample size so huge?

Because detecting small improvements at low baselines is statistically hard. At 1% baseline, detecting a 5% relative lift (1.00% → 1.05%) needs ~60k per variant. The smaller the effect, the exponentially more data you need.

What values should I use for α and power?

Industry standard: α=5% (accept a 5% false-positive rate) and power=80% (detect 80% of real effects). More rigorous teams use α=1% and power=90% but sample sizes balloon.

What's a 'minimum detectable effect'?

The smallest improvement you'd care about. If a 2% lift isn't worth shipping, set MDE=5% and you'll stop the test much sooner. Honest teams set this BEFORE running, not after seeing results.

Should I use a Bayesian approach instead?

Fixed-sample frequentist (what this tool computes) is still industry norm — easy to explain to stakeholders, works with most A/B tools. Bayesian is better if you can stop early based on peeking; requires specialized tooling.

Why does the calculator say 'infinity' days?

Your daily traffic is 0 or the required sample exceeds your traffic × reasonable run length. Either increase traffic, relax your MDE, or pick bigger pages to test.