A/B testing

Split decision – what to consider when A/B testing

Illustration of person interacting with an email pop-up

Which of these statements about A/B testing is correct?

A. It’s a fun little experiment.
B. It’s a powerful tool which must be handled correctly.

The correct answer is B. Split testing is a powerful tool. And, like all tools, it has the potential to do more harm than good if it’s not operated with care.

It’s therefore unfortunate that split testing is often treated more like option A. “Hey, let’s try this” says someone… and suddenly you’re conducting an experiment based on an arbitrary, last-minute decision.

Testing in a haphazard and disjointed manner may garner useful insights into your customers’ behaviour but it’ll be mostly down to luck. Such unplanned testing is as likely to waste time on inconclusive, insignificant or even misleading results.

There’s a better way.

Devise an A/B testing strategy

A good quality A/B test deserves as much consideration as the content, design, segmentation or any other aspect of your campaign. Draw up a solid plan and set out with a real purpose.

Modern email marketing tools have made it easy to set up a split test. It’s tempting to jump right in and get started, but forethought and preparation pave the way to the best results. Decide who, why, what, how long and what next. Prove a theory. Discover an unknown. Learn about your customers.

A well-planned, one-off A/B test is great. But what’s even better is an ongoing series of interrelated tests. Maybe you’d like to conduct the same test under different circumstances, or the results spur your next hypothesis and then the one after that. Think of A/B testing as a programme rather than a standalone exercise and you’re on the right track for real customer insight.

Don’t waste time proving the obvious

Some years ago, I was working on a campaign with a three-way subject line test. I don’t quite remember all of the gory details but I can recall enough to illustrate my point. The email in question was a monthly ‘what’s on’ newsletter. It was going to the full UK mailing list. There was no targeting based on personal interests or any other such criteria.

Two of the subject lines focused on specific events. One of these was a Formula One race. The other escapes me, so let’s say that it was the World Bog Snorkelling Championships.

The third subject line kept things general. Rather than highlighting individual events, it indicated that a broad range of interests were catered for. I said: “this one will win”. And it did – by some margin.

As handy as it would be to possess some kind of marketing clairvoyance, my prediction wasn’t based on any special intuition. It was common sense. The events featured in subject lines #1 and #2 may be passions for some but they were surely of limited interest to the customer base at large.

Split testing can provide unique insight into what makes your customers tick – don’t squander it on predictable outcomes!

Beware of false positives, use deep pots and don’t jump the gun

Remember when I mentioned A/B testing’s potential for harm? This is where false positives come into play. Poor execution of a test or misanalysis of the results can lead to inaccurate conclusions, point your subsequent marketing efforts in the wrong direction and ultimately reduce engagement. Disaster!

False positives are tied to the concept of statistical significance. Without wanting to pretend that I fully understand the mathematics behind statistical significance, what it comes down to is this: the result of your split test could be the product of chance. There are a few key points to consider:

  • Have enough people been allocated to the test?
  • Has it been allowed to run for long enough?
  • By what margin was the winner decided?

The smaller the sample size, the less representative it is of the average customer and the more skewed the outcome will be. Likewise, a hasty one-hour testing window doesn’t give enough people sufficient time to engage. The balance could very easily tip in the opposite direction after the winner has been decided. I’ve seen a few prematurely-crowned ‘winning’ subject lines going out to the remaining database. MailChimp delves into actual numbers with this handy guide to A/B testing durations.

Once you’re satisfied that ample people and time have been allocated, your test is no longer doomed from the start. There is however still a chance that you’ll encounter a false positive. A marginal win could be the result of natural variance between your test groups. It’s worth running a test a few more times to see what happens. Neil Patel’s statistical significance calculator is a great tool for analysing real numbers.

To witness natural variance in action, you may wish to consider running a few A/A tests before moving onto A/B. That is to say, you’ll send identical content to two groups and record how much the performance metrics vary. Do that and you’ll find yourself in a more informed position when it comes to analysing future split test results.

Test one thing at a time… except when you don’t

That subject line test is all well and good, but why not throw in some alternative imagery, rewrite some copy and switch up the colour palette while you’re at it?

Because you won’t know what people are responding to. Maybe your customers prefer the word “today” to “now”, or maybe they found the daytime picture more compelling than night-time. You can only guess.

This is, of course, both common sense and common knowledge (although I have seen such a mistake made on a number of occasions). What is less obvious is that it’s possible to develop your testing programme to a level at which multiple campaign elements can and should be tested simultaneously. There’s even a phrase for it – multivariate testing.

After you’ve conducted A/B testing on an single element – be it the subject line, offer, call-to-action or anything else – you’ll know how that one component impacts the email. But your customer probably isn’t mentally breaking your mailing down into its constituent parts. Once you’ve measured the performance of individual items, it’s time to move onto the next step – testing their combinative effect.

I once worked on an email programme in which dynamic content blocks were pulled in via a specialist email agency’s platform. To determine the value of this partnership, we ran a few A/B tests comparing the performance of emails with and without dynamic blocks. Following conventional wisdom, we tested these blocks one at a time. The results were disappointing – it seemed that these dynamic blocks had no significant impact on email performance. We shared this news with the people at the agency. They were unsurprised and unconcerned. Their advice: test multiple blocks at the same time.

They were right. Clicks went through the roof and we learned a valuable lesson about how email components work collectively.

Remember to segment

We know that statistically significant results are more likely when working with a large sample size. So there’s a solid rationale behind the decision to run A/B tests on emails going to the full mailing list.

But there’s more to it than that. Mixing inactive subscribers into your A and B pots is going to dilute your test’s effectiveness. Make sure your sample groups consist of people who actually open your emails.

If your mailing list numbers make it a realistic option*, it’s worth considering running tests according to demographic group. A regular buyer and new customer may be tempted by different products. A subject line which grabs the attention of a 60-year old subscriber may not be so compelling to a 30-year old.

*And if you need to grow your mailing list, here are some tips on that.

Act on A/B testing data

The A/B test is complete. The outcome is enthusiastically discussed with colleagues. Everyone agrees that it’s very interesting. And then the information is recorded in a report, filed away and never thought of again. Don’t do that.

Let’s consider a typical subject line test. Two pots of 10% have been split out and the remaining 80% will automatically receive the most-opened subject line three hours later. That’ll boost Tuesday’s open rate, which is nice… but there are lessons to be learned beyond the scope of that one email.

It’s clear that customers prefer this particular set of words to that set of words but what does that suggest about how customers respond to tone of voice? How can that idea be factored into subsequent marketing activities? Could the information provide a basis for new theories on colour or imagery?

It’s easy to see A/B testing results through tunnel vision. Consider the big picture instead and we can truly harness the power of information.

Be aware of change

Some findings are so fundamental that you can consider them to be set in stone. It’s safe to say that people won’t ever prefer non-personalised emails to personalised, for instance.

But it’s important to remember that trends come and go, exciting new products become yesterday’s news and not all A/B test results stand forever.

If A beats B by a landslide, then you’ve learned a valuable piece of information about the current market. But will the results still reflect your audience’s tastes six months or a year from now? It’s prudent to plan for re-testing.

It’s time to split

Find inspiration in A/B success stories such as Campaign Monitor’s 127% uplift in clicks. Earn valuable data when your own tests deliver concrete results. And reap the rewards by learning from that information and setting your email marketing programme on an upward trajectory.

Make the right split decisions and get ready for knockout results.