You're probably AB testing wrong.
Long time lurker figured I'd start contributing. I've seen a fair bit of stuff crop up lately about AB testing and it kinda annoys me every time I see people screwing it up.
So let's have a chat about bayesian statistics, multi armed bandits and ab testing.
If you understand this joke. You can probably skip this or call me out on my mistakes.


Source: xkcd: Frequentists vs. Bayesians
The ELI5 of this is that frequentist statistics only looks at the data in the experiment. Bayesian statistics can be informed by prior knowledge.
What else does Bayesian Statistics offer?
Firstly, you need to understand that you're answering fundamentally a different question.
With a frequentist approach you generally look at answering a null hypothesis. The null hypothesis is that there is no difference in the conversion rate of variation A and variation B. Now to answer this you look at a P value which is effectively a measure of what the chance of the result you've observed having occurred by random chance. If the p-value is less then < 0.05 you typically say that you disprove the null hypothesis and there is a winning variation.
With bayesian statistics you generally phrase questions along the lines of "What's the probability that variation A is better" or "What's the probability that variation A is the best or within x% of variation B". These are far more valuable when making business decisions - you can start to look at the opportunity cost versus probability of a negative outcome. This allows you to start valuing the speed of iterating quickly versus waiting for absolute certainty. You do have to worry about priors but you can simply set them to a uniform distribution of a previous week or so of conversion data. This helps in the early stages of the AB test when you're waiting for significant data to come in.
Now let's go through the two main ways of AB testing.
There's multi-armed bandits and what I call vanilla splits
In a vanilla split you take all the traffic going to each variation and you split it at a set ratio. You then look at the results using bayesian statistics and decide who the winner was.
In a multi armed bandit (GOOGLE DOES THIS BY DEFAULT) you start out with a split and then the split varies depending on what variation is winning. This aims to send more traffic to the likely 'winner' and reach significance quicker. The one issue is that the premises underlying this are pretty munted when it comes to real world. Anyone who's played the internet marketing game seriously knows that traffic quality varies significantly. Some is good some is bad.
Let's take a look at what happens when you have a multi armed bandit with varying traffic quality. Think of a set as a day of traffic.

Say the true conversion of Variant A is 0.1 and Variant B is 0.2
For the ease of explanation we'll do the allocation at one discrete interval and it'll be proportional to the conversion at the time. Of course the allocation algorithm could be different, the above problem still exists for all the given allocation algo's I'm aware of.
Set 1: 2k traffic comes in, it converts as expected.
Set 2: 2k traffic comes in, we allocate it according to previous conversions. It converts as expected.
Set 3: A spammer had a field day we get 20k of traffic but it converts terribly. However Variant B has a higher proportion of shit traffic to good traffic then variant A.
When we look at the results we see the conversion rate of B overall is now worse then the conversion rate of A, even though we know that the true conversion rate of B is double that of A.
Apply some bayesian statistics to the above and you'll see that Variation A is the winner with 100% confidence.
This is a simple example of it with a very obvious anomaly which arguably we could detect and potentially account for.
The question I pose to all my fellow warriors is can we confidently account for more subtle variations and is it worth it for reaching significance a little bit quicker? If you don't think we can everyone needs to quit using multi armed bandits in google and wherever else they're running tests
.Hope you enjoyed this guys hit me up with any questions and I'll try and get around to answering them when I find time.
Only $57
Click Here for Details