Eric Jin← Back to projects
PythonStatsmodelsScipyHypothesis TestingLogistic Regression

A/B Testing — Landing Page Experiment

Rigorous statistical analysis of 290K users across hypothesis testing, simulation, and logistic regression

Overview

Analyzed a landing page A/B experiment across 290K+ e-commerce users to determine whether a new page significantly improved conversion. Used three complementary methods — bootstrap simulation, z-test, and logistic regression with country interaction effects — to build a statistically rigorous decision framework.

Methods

  • Data cleaning: removed misaligned group/page assignments and duplicate user IDs
  • Baseline probability analysis: computed group-level conversion rates (control: 12.04%, treatment: 11.88%)
  • Bootstrap simulation: 10,000 permutations of p_new − p_old under the null hypothesis to build an empirical sampling distribution
  • Frequentist z-test via statsmodels proportions_ztest (one-tailed, α=0.05)
  • Logistic regression (statsmodels Logit) to model conversion as a function of page and country dummies
  • Interaction analysis: tested whether page effect differed by country (US × page, CA × page interaction terms)

Key Findings

  • Bootstrap p-value: 0.91 — 91% of simulated differences exceed the observed difference, strongly failing to reject the null
  • Z-test p-value: 0.905 — consistent with bootstrap result across both methods
  • Logistic regression: ab_page coefficient p-value 0.19 — not statistically significant at any conventional threshold
  • Country had no significant effect on conversion: CA p=0.074, US p=0.457
  • Country × page interaction terms also insignificant — page effect is uniform across markets
  • Given the large sample size (290K users), continued testing is unlikely to flip the result — the effect simply does not exist

Results

All three methods converge: the new page does NOT improve conversion rate

Observed difference: −0.16pp (treatment underperforms control marginally)

Decision: do not implement the new page; redirect resources to a new design iteration

290K users, 10,000 bootstrap iterations, consistent p-value ≈ 0.91 across methods