20 statistics resources I wish I knew when starting A/B testing
Your ultimate guide to mastering A/B testing statistics and overcoming the overwhelm
Ah, statistics - the subject that can either make you feel like a genius or leave you feeling like you just got hit by a freight train. As a beginner in A/B testing, I vividly remember the feeling of overwhelm that comes with trying to wrap your head around all the statistical concepts and jargon.
When it comes to finding the best statistics resources for A/B testing, one can quickly become overwhelmed by the vast amount of information scattered across the internet. It can take hours, if not days, of dedicated research to compile a list of reputable sources that cover all the essential concepts and techniques needed to become proficient. Even then, it can be challenging to know which resources to trust, as misinformation and biased opinions run rampant online and, regrettably, even in textbooks. As a result, many beginners in the field find themselves struggling to sort through the noise and gain a comprehensive understanding of A/B testing statistics.
After many years of trial and error, and some serious hair-pulling moments, I have collected my personal treasure trove of statistics resources that I wish I knew when I was starting out. In this post, I'll share some of these resources with you so that you can hopefully avoid some of the headaches and frustrations.
Here are some resources I wish I had started with:
Introduction to Statistics: An Intuitive Guide for Analyzing Data and Unlocking Discoveries.
Consider this resource your "Hello World" introduction to statistics. While some readers may be well-versed in the basics and feel comfortable skipping this section, it is crucial for those new to the subject to develop a firm grasp of statistical vocabulary and principles. Without this foundation, effectively navigating hypothesis testing can be a daunting task. So take the time to absorb these fundamental concepts.Hypothesis Testing: An Intuitive Guide for Making Data Driven Decisions
This is another fantastic book by Jim Frost. Hypothesis testing is the foundation of A/B testing and Jim Frost strikes the perfect balance between rigor and intuition by providing a deep understanding of the subject matter, while his use of clear, concise examples makes the material accessible to a wide audience.
If you're seeking a comprehensive guide to A/B testing statistics, look no further than this book, which could be considered the bible of the field. Out of all the resources I'm recommending, this one stands out for its precision and practicality. Georgi Georgiev expertly curates the concepts in statistics that are most relevant to online experimentation, making this book an invaluable bridge between academic theory and your actual practice. Whether you're a newcomer or a seasoned pro in A/B testing, this resource is a must-have in your toolkit. He also has a blog which I highly recommend. If you prefer video over text, check out his course on CXL.
The knowledge in those books alone should easily put you above the 95th percentile of beginners in the CRO space regarding statistics. For most, this level of understanding is enough to get started with trustworthy experimentation and avoid getting lost in the depths of statistical theory. However, for the few lost souls that somehow start enjoying this, let’s not stop here and talk p-values, confidence intervals and stuff. You should know what a p-value is by now. But are you really sure you understand it?
I have to admit that I had to study this topic from multiple angles before truly feeling confident in my comprehension. Don't be discouraged if you don't immediately grasp a concept after reading one source; even textbooks and scientists can struggle with these ideas.
The advice I would give my beginner self: variance and reps. Variance means that sometimes an explanation for a statistical concept will just “click” for some people in ways that other explanations don’t. So try different stuff.
Daniel Lakens puts it nicely for reps:
„Probabilities are confusing, and the interpretation of a p-value is not intuitive. Grammar is also confusing, and not intuitive. But where we practice grammar in our education again and again and again until you get it, we don’t practice the interpretation of p-values again and again and again until you get it.”
Here are the resources that I found most helpful:
Statistical tests, P values, confidence intervals, and power: a guide to misinterpretations
This paper is the ultimate test. It provides an explanatory list of 25 misinterpretations of p-values, confidence intervals, and power. Try to read the stated misconceptions and refute them by yourself before reading the answer. If you have trouble doing it, you can be sure that you still have some work to do. Still struggling? Don’t worry you’ll get there, try the next one.
Understanding common misconceptions about p-values
This blog post by Daniel Lakens uses very intuitive examples and simulations to get to the core of the most common misconceptions. This whole blog is a goldmine.
Improving Your Statistical Inferences
Daniel Lakens also has this wonderful free ebook. Can’t recommend it enough.
Interpreting A/B test results: false positives and statistical significance
Netflix Tech Blog provides an excellent overview about experimentation across Netflix and the basics of A/B testing statistics.
AB Testing: Ruling Out To Conclude
Matt Gershoff tackles this from a different perspective: „Seemingly simple ideas underpinning AB Testing are confusing. Rather than getting into the weeds around the definitions of p-values and significance, perhaps AB Testing might be easier to understand if we reframe it as a simple ruling out procedure.”
P-values and Confidence Intervals Explained
I talked about Georgi's blog earlier, and I want to highlight this article that I really like because of its distinctive viewpoint.
The Relationship Between Bayes' Theorem and P-Values
One of the most common misconceptions about p-values is the assumption that they tell you something about the probability of the hypothesis P(H0 | Data). This blog post provides an intuitive example of connecting NHST to Bayes' theorem.
How to Intuit the Prosecutor’s Fallacy (and Run Better Hypothesis Tests)
The prosecutor’s fallacy is another great example to illustrate that distinction.
How to systematically approach truth - Bayes' rule
This is my favorite video regarding Bayes' rule.
The p value and the base rate fallacy
A helpful article if you struggle to differentiate between a given alpha and the false discovery rate. The base rate fallacy shows us that false positives are much more likely than you’d expect from a p < 0.05 criterion for significance.
Sometimes the best way to learn something is to understand how not to do it. It’s a short and rather beginner friendly book about common statistical errors and how to avoid them. It covers important topics such as power of test, confidence intervals, p-value interpretation, and multiple comparisons.
A/B Testing Intuition Busters: Common Misunderstandings in Online Controlled Experiments
Erroneous applications and misunderstanding of the statistics in books, papers, and software.
The Impact of Statistical Power on P-value Distribution and Lift Estimates – a Simulation
Ronny Kohavi created this simulation to show that if you run a low-powered
experiment, your p-values are basically uniform, but if you get lucky, you will suffer from the winner’s curse. Sometimes, trying different settings in a simulation can be more informing than just reading about it.
Fisher, Neyman-Pearson or NHST? A tutorial for teaching data testing
Did you know that NHST ( Null Hypothesis Significance Testing) is actually an amalgamation of two different philosophies in the frequentist field ( Fisher vs. Neyman-Pearson)? I didn’t. It suddenly made a lot of things clearer.
Common statistical tests are linear models
Did you know that you can view common statistical tests through the lens of a linear model?
A Leader's Guide to Causal Inference
A nice little intro to causal inference. I often used the examples to illustrate the concept of causality to other stakeholders.
Statistical Inference as Severe Testing
Mayo provides a look at the main philosophical approaches to statistical inference: frequentist, Bayesian and Likelihoodist. She discusses the concept of error probabilities, explains the significance of error statistics in scientific research, and provides practical guidelines for designing and conducting experiments that employ severe tests.
And there you have it, folks - a selection of statistics resources that I wish I had when I was starting out in A/B testing. Don't get me wrong, learning statistics is still no walk in the park, but having access to helpful resources can make a world of difference. I hope they will be as helpful to you as they were to me when I discovered them. Let me know if you miss some resources that helped you on your way.
If you enjoyed this content and found it useful, I encourage you to subscribe to this newsletter to receive regular updates. What’s the newsletter about? A delightful mishmash of everything and anything that touches experimentation, science, innovation and a pinch of randomness.
Best,
David