PythonStatsmodelsScipyHypothesis TestingLogistic Regression

A/B Testing — Landing Page Experiment

Rigorous statistical analysis of 290K users across hypothesis testing, simulation, and logistic regression

Overview

Analyzed a landing page A/B experiment across 290K+ e-commerce users to determine whether a new page significantly improved conversion. Used three complementary methods — bootstrap simulation, z-test, and logistic regression with country interaction effects — to build a statistically rigorous decision framework.

Methods

Data cleaning: removed misaligned group/page assignments and duplicate user IDs

Baseline probability analysis: computed group-level conversion rates (control: 12.04%, treatment: 11.88%)

Bootstrap simulation: 10,000 permutations of p_new − p_old under the null hypothesis to build an empirical sampling distribution

Under H₀, both pages share the same conversion rate (p = 0.11960). Each iteration draws sample proportions for the new page (n = 145,310) and old page (n = 145,274) from the null distribution and records their difference. The p-value is the fraction of simulated differences that exceed the actual observed difference (-0.00158).

Sample proportions are drawn from Normal(p_null, √(p_null·(1−p_null)/n)) — the CLT approximation for binomial proportions. Mathematically equivalent to the original Python simulation but runs instantly in the browser.

Control rate

12.04%

Treatment rate

11.88%

Observed diff

-0.00158

Bootstrap p-value

—

Iterations:

Select iterations and click “Run Simulation” to visualize the null distribution.

Frequentist z-test via statsmodels proportions_ztest (one-tailed, α=0.05)

Logistic regression (statsmodels Logit) to model conversion as a function of page and country dummies

Interaction analysis: tested whether page effect differed by country (US × page, CA × page interaction terms)

Toggle model terms below. Results update instantly — computed in-browser via Newton-Raphson MLE on the pre-aggregated 6-cell dataset (290,584 users). UK is the reference country.

Country dummies (CA, US)US × page interactionCA × page interaction

Model: converted ~ intercept + ab_page

Variable	Coef	Std Err	z	P > \|z\|	[0.025	0.975]
intercept	-1.9888	0.0081	-246.669	<0.001✦	-2.0046	-1.9730
ab_page	-0.0150	0.0114	-1.311	0.190	-0.0374	0.0074

✦ p < 0.05~ p < 0.10|Logit coefficients — log-odds scale

Key Findings

Bootstrap p-value: 0.91 — 91% of simulated differences exceed the observed difference, strongly failing to reject the null
Z-test p-value: 0.905 — consistent with bootstrap result across both methods
Logistic regression: ab_page coefficient p-value 0.19 — not statistically significant at any conventional threshold
Country had no significant effect on conversion: CA p=0.074, US p=0.457
Country × page interaction terms also insignificant — page effect is uniform across markets
Given the large sample size (290K users), continued testing is unlikely to flip the result — the effect simply does not exist

Results

All three methods converge: the new page does NOT improve conversion rate

Observed difference: −0.16pp (treatment underperforms control marginally)

Decision: do not implement the new page; redirect resources to a new design iteration

290K users, 10,000 bootstrap iterations, consistent p-value ≈ 0.91 across methods

Ask About This Project

Have a technical question? Ask here.

← All projectsEric Jin