The A/B Delusion in B2B

… sorry, I can’t resist provocative headlines….

In the past 6 months I have been deep in a rabbit hole of sales statistics

My latest is trying to measure the effects of cold reach-out warmup. If we skip the warmup, how much lower is conversion?

Another test was to see if our at-risk email domain performed as good as our gold reference client domain.  Sounds like simple A/B test doesn’t it?

Actually, it would be if we were doing high volume B2C style performance marketing.

Sample sizes in B2B

In our case, however, we are in high ticket B2B. That means risking 200 names in the B-sample (without warmup) is already ‘expensive’ in terms of opportunity costs. When we first did this, the result was shocking — no warmup (B) outperformed warmup (A). We added 200 more names to B and the results reversed– so much so that we need to pull any last ones in B and put them back in A.  Obviously, we needed a completely independent test. This time 300 in A and 300 in B. Again, the results were reversed from what we could imagine logically happen.  Not warming up a cold contact should never improve conversion. Adding 200 more to each, and again we see new data. Similarly, the A/B test for a domain used the same randomized names and within the first 150 names each, the illogical one (new domain) was 5x better than the ‘gold benchmark’. What is going on?

Just like the question, how soon can we evaluate campaign success?, the law of small numbers is what is making this hard. Meetings are a rare event (compared to emails sent). One more meeting in one group and one less in the other and statistics change significantly. Looking at confidence intervals, we need to risk at least 500+ names in a B group to get any kind of quality in an A/B test.

Oh, and this is only valid if the A group is already proven productive with at least 1% conversion.  Who would risk 5+ meetings to maybe make it better?


My conclusions?

  1. Statistics are hard. Your emotions and intuition are not your friend as you watch a campaign or a series of campaigns play out. It is very hard to resist ‘seeing trends’ or getting an ‘early insight’.
  2. .While B2C lead generation is highly measurable (high volums), A/B testing features found in the B2B outbound tools (Outreach, SalesLoft, etc) are essentially useless. We as an agency with many clients can only ‘afford’ two per year of any quality. A single business development rap optimizing their own campaigns will never have enough ‘at risk’ targets unless they are totally failing.
  3. Intuition and experience, unlike B2C performance marketing, is critical in B2B.  Young ones cannot A/B test themselves to success like they can in B2C performance marketing or conversational cold calling.
  4. Good is the enemy of Great — as you get good, the cost of testing to find better gets exponentially expensive. There is nothing to do about this reality – accept it!



Like this article?

Recent Articles


Outbound B2B Lead Generation in DACH