PythonStatsmodelsScipyHypothesis TestingLogistic Regression
A/B Testing — Landing Page Experiment
Rigorous statistical analysis of 290K users across hypothesis testing, simulation, and logistic regression
Overview
Analyzed a landing page A/B experiment across 290K+ e-commerce users to determine whether a new page significantly improved conversion. Used three complementary methods — bootstrap simulation, z-test, and logistic regression with country interaction effects — to build a statistically rigorous decision framework.
Methods
- Data cleaning: removed misaligned group/page assignments and duplicate user IDs
- Baseline probability analysis: computed group-level conversion rates (control: 12.04%, treatment: 11.88%)
- Bootstrap simulation: 10,000 permutations of p_new − p_old under the null hypothesis to build an empirical sampling distribution
- Frequentist z-test via statsmodels proportions_ztest (one-tailed, α=0.05)
- Logistic regression (statsmodels Logit) to model conversion as a function of page and country dummies
- Interaction analysis: tested whether page effect differed by country (US × page, CA × page interaction terms)
Key Findings
- Bootstrap p-value: 0.91 — 91% of simulated differences exceed the observed difference, strongly failing to reject the null
- Z-test p-value: 0.905 — consistent with bootstrap result across both methods
- Logistic regression: ab_page coefficient p-value 0.19 — not statistically significant at any conventional threshold
- Country had no significant effect on conversion: CA p=0.074, US p=0.457
- Country × page interaction terms also insignificant — page effect is uniform across markets
- Given the large sample size (290K users), continued testing is unlikely to flip the result — the effect simply does not exist
Results
All three methods converge: the new page does NOT improve conversion rate
Observed difference: −0.16pp (treatment underperforms control marginally)
Decision: do not implement the new page; redirect resources to a new design iteration
290K users, 10,000 bootstrap iterations, consistent p-value ≈ 0.91 across methods
← All projectsEric Jin