I recently had a discussion with a professional in the field of outbound lead generation– I’ll call him John. John was explaining how he is always doing A/B testing on small sample sizes to improve the results. John is a legitimate expert in the field, but I suspected that his ability to generate meetings had nothing to do with his A/B testing (see previous A/B testing fallacy blog post.)
To illustrate the problem, let’s say we want to see the effect of flipping a coin with our left vs right hand. If I did 5 tests of each, I might see a result like 3 heads for left and 1 head for right. However, we know that the coin is actually 50/50 regardless of hand and even if there was an effect it could never be so big. The ‘data’ from this experiment is pure noise. However, if I do 5000 flips, I will clearly see there is no difference. Now, lets put a piece of tape on one side of the coin to see if that increases the likelihood that the tape side is up or the tape side is down. After 5 tests, unless the effect was very ,very strong, we would never see a real effect and would instead see more statistical noise. The smaller the true underling statistical effect, the more samples you need to detect it. Note, however, that you will alway see an effect. The question is: is that a mirage or real?
In the case of John, I suspected he was thinking mirages were real data. So, I sat down with another fellow lead generation mathematics nerd and ran the numbers again. This time we look at Z-scores and frequentist statistics…
How many leads do I need for A/B testing my sales message?
The basic question is how many samples in my B group would I need to risk in order to detect an improvement of 30%?
If we assume we have:
- a very high performing campaign (2% meeting rate)
- that we want to be 80% sure we would detect the effect if it were real
- that there is only a 10% chance that we measure an effect when it was not real.
The math looks like this:
Doing the math, the answer is 806 samples in the B group compared against the same number in the A group. That means:
800 prospects at 2% conversion is: 16 meetings were put at risk in order to improve meeting yield. If the improvement were anything less than 30% (eg. conversion of 2.6% or better) we would not see the effect.
How many leads and meetings can I put at risk?
If you have 10,000+ prospects remaining, that might be a good risk. You might even be tempted to see how many prospects you would need to see a 10% improvement ( answer: 6600!) If it were a more modest campaign with a 1% conversion, then it would take 13000 prospects to detect a 10% difference.
The math again, says that nearly all B2B campaigns don’t have enough leads to risk real data driven A/B testing. (One exception is non-performing campaigns. If the yield is 0.3% and we are looking for a 300% difference to get it to 1% of we shut it down, then the number of prospects needed is a manageable number of a few hundred.)
Ok, and what about John?
If both A&B are good messages, then it does not matter that the test has no mathematical accuracy. If John believes he is learning, he keeps working on making the message better and better. John is talented. If he puts his energy into writing a great message, he will. However, his A/B testing technique, is likely just a placebo.
Should I tell him? For sure no. Just because A/B testing in B2B is not real, talent is real and very, very important.