This post walks through an end to end A/B test using a real mobile game dataset. The goal is simple. Decide whether placing a progression gate at level 5 or level 10 leads to better player retention. The analysis is implemented in a Jupyter notebook, and this article translates that notebook into a readable narrative while keeping the technical substance intact.
Problem context
In many mobile games, progression gates slow players down unless they invest time or make an in app purchase. The placement of these gates can meaningfully affect player experience and long term retention.
Players were randomly assigned to one of two versions.
- Gate at level 5
- Gate at level 10
Success is evaluated using one day retention and seven day retention.
Dataset overview
The dataset contains 90,189 players. Each row represents a single user and includes the following fields.
useridversionindicating gate placementsum_gameroundsrepresenting total rounds playedretention_1andretention_7as binary indicators
Before analyzing outcomes, we validate the experiment split.
import pandas as pd
df = pd.read_csv("game_app_ab_testing.csv")df.groupby("version")["userid"].count()Both variants have comparable sample sizes, which allows a fair comparison.
Understanding player activity
Before focusing on retention, it helps to understand overall engagement. The distribution of total game rounds is heavily right skewed. Most players churn early, while a small fraction play for a long time.
plot_df = df.groupby("sum_gamerounds")["userid"].count()plot_df.plot(kind="hist")This pattern is common in free to play games and reinforces why retention is a more stable metric than total playtime.
Retention metrics
Retention is measured at two horizons.
- One day retention captures immediate engagement
- Seven day retention reflects short term value
A small number of players record zero game rounds yet still return. This edge case is rare but worth keeping in mind when interpreting results.
Comparing gate placement
We begin by computing average retention for both variants.
df.groupby("version")[["retention_1", "retention_7"]].mean()At first glance, the level 5 gate shows slightly higher retention at both horizons. To understand whether this difference is meaningful, we estimate uncertainty using bootstrapping.
Bootstrapping the difference
Bootstrapping allows us to approximate the sampling distribution without relying on parametric assumptions. We repeatedly resample users with replacement and recompute retention for each group.
boot_1d = []
for _ in range(5000): sample = df.sample(frac=1, replace=True) stats = sample.groupby("version")["retention_1"].mean() boot_1d.append(stats)The same approach is applied to seven day retention. From these samples, we compute the percentage difference between gate placements.
Visualizing the results
Kernel density plots make the comparison intuitive.
boot_1d_diff.plot(kind="kde")boot_7d_diff.plot(kind="kde")In both cases, the majority of the distribution lies above zero, indicating higher retention when the gate is placed at level 5.
Probability interpretation
Rather than focusing on a single p value, we compute the probability that level 5 outperforms level 10.
(boot_1d_diff > 0).mean()(boot_7d_diff > 0).mean()These probabilities are high for both metrics, providing strong evidence that earlier gating improves retention.
Final takeaway
Moving the progression gate from level 10 to level 5 increases both one day and seven day retention. The effect size is modest but consistent and statistically robust.
From a product perspective, this suggests that earlier friction does not necessarily harm engagement. From a data perspective, it highlights how bootstrapping offers a clear and interpretable framework for A/B testing decisions.
The accompanying notebook contains the full implementation and can be reused as a template for similar experiments.