Eric Jin← Back to projects
PythonStatsmodelsScipyHypothesis TestingLogistic Regression

A/B Testing — Landing Page Experiment

Rigorous statistical analysis of 290K users across hypothesis testing, simulation, and logistic regression

Overview

Analyzed a landing page A/B experiment across 290K+ e-commerce users to determine whether a new page significantly improved conversion. Used three complementary methods — bootstrap simulation, z-test, and logistic regression with country interaction effects — to build a statistically rigorous decision framework.

Methods

Data cleaning: removed misaligned group/page assignments and duplicate user IDs
Baseline probability analysis: computed group-level conversion rates (control: 12.04%, treatment: 11.88%)
Bootstrap simulation: 10,000 permutations of p_new − p_old under the null hypothesis to build an empirical sampling distribution

Under H₀, both pages share the same conversion rate (p = 0.11960). Each iteration draws sample proportions for the new page (n = 145,310) and old page (n = 145,274) from the null distribution and records their difference. The p-value is the fraction of simulated differences that exceed the actual observed difference (-0.00158).

Sample proportions are drawn from Normal(p_null, √(p_null·(1−p_null)/n)) — the CLT approximation for binomial proportions. Mathematically equivalent to the original Python simulation but runs instantly in the browser.

Control rate

12.04%

Treatment rate

11.88%

Observed diff

-0.00158

Bootstrap p-value

Iterations:
Select iterations and click “Run Simulation” to visualize the null distribution.
Frequentist z-test via statsmodels proportions_ztest (one-tailed, α=0.05)
Logistic regression (statsmodels Logit) to model conversion as a function of page and country dummies
Interaction analysis: tested whether page effect differed by country (US × page, CA × page interaction terms)

Toggle model terms below. Results update instantly — computed in-browser via Newton-Raphson MLE on the pre-aggregated 6-cell dataset (290,584 users). UK is the reference country.

Model: converted ~ intercept + ab_page

VariableCoefStd ErrzP > |z|[0.0250.975]
intercept-1.98880.0081-246.669<0.001-2.0046-1.9730
ab_page-0.01500.0114-1.3110.190-0.03740.0074
p < 0.05~ p < 0.10|Logit coefficients — log-odds scale

Key Findings

  • Bootstrap p-value: 0.91 — 91% of simulated differences exceed the observed difference, strongly failing to reject the null
  • Z-test p-value: 0.905 — consistent with bootstrap result across both methods
  • Logistic regression: ab_page coefficient p-value 0.19 — not statistically significant at any conventional threshold
  • Country had no significant effect on conversion: CA p=0.074, US p=0.457
  • Country × page interaction terms also insignificant — page effect is uniform across markets
  • Given the large sample size (290K users), continued testing is unlikely to flip the result — the effect simply does not exist

Results

All three methods converge: the new page does NOT improve conversion rate

Observed difference: −0.16pp (treatment underperforms control marginally)

Decision: do not implement the new page; redirect resources to a new design iteration

290K users, 10,000 bootstrap iterations, consistent p-value ≈ 0.91 across methods

Ask About This Project

Have a technical question? Ask here.