A/B Testing — Landing Page Experiment
Rigorous statistical analysis of 290K users across hypothesis testing, simulation, and logistic regression
Overview
Analyzed a landing page A/B experiment across 290K+ e-commerce users to determine whether a new page significantly improved conversion. Used three complementary methods — bootstrap simulation, z-test, and logistic regression with country interaction effects — to build a statistically rigorous decision framework.
Methods
Under H₀, both pages share the same conversion rate (p = 0.11960). Each iteration draws sample proportions for the new page (n = 145,310) and old page (n = 145,274) from the null distribution and records their difference. The p-value is the fraction of simulated differences that exceed the actual observed difference (-0.00158).
Sample proportions are drawn from Normal(p_null, √(p_null·(1−p_null)/n)) — the CLT approximation for binomial proportions. Mathematically equivalent to the original Python simulation but runs instantly in the browser.
Control rate
12.04%
Treatment rate
11.88%
Observed diff
-0.00158
Bootstrap p-value
—
Toggle model terms below. Results update instantly — computed in-browser via Newton-Raphson MLE on the pre-aggregated 6-cell dataset (290,584 users). UK is the reference country.
Model: converted ~ intercept + ab_page
| Variable | Coef | Std Err | z | P > |z| | [0.025 | 0.975] |
|---|---|---|---|---|---|---|
| intercept | -1.9888 | 0.0081 | -246.669 | <0.001✦ | -2.0046 | -1.9730 |
| ab_page | -0.0150 | 0.0114 | -1.311 | 0.190 | -0.0374 | 0.0074 |
Key Findings
- Bootstrap p-value: 0.91 — 91% of simulated differences exceed the observed difference, strongly failing to reject the null
- Z-test p-value: 0.905 — consistent with bootstrap result across both methods
- Logistic regression: ab_page coefficient p-value 0.19 — not statistically significant at any conventional threshold
- Country had no significant effect on conversion: CA p=0.074, US p=0.457
- Country × page interaction terms also insignificant — page effect is uniform across markets
- Given the large sample size (290K users), continued testing is unlikely to flip the result — the effect simply does not exist
Results
All three methods converge: the new page does NOT improve conversion rate
Observed difference: −0.16pp (treatment underperforms control marginally)
Decision: do not implement the new page; redirect resources to a new design iteration
290K users, 10,000 bootstrap iterations, consistent p-value ≈ 0.91 across methods
Ask About This Project
Have a technical question? Ask here.