To the outside observer, A/B testing can appear to be a web designer's Utopia. Never make another decision! the adverts for such a place might trumpet; Let your users show you what works best!

And it's certainly true that A/B testing changes to a website -- running two different versions and then analysing the differences in user behaviour between the two -- can be an extremely powerful tool in a designer's arsenal. But anyone that has run as many A/B tests as I have will tell you that it can be joy and frustration in equal measure. Not only because many tests will inevitably fail, and what you thought would help proves only to hinder; there are times that even the most successful tests are just as frustrating as the failures.

I've come to realise the reason for this. And it involves Dungeons & Dragons.

A brief D&D alignments refresher

For those unfamiliar with the concept of alignment as it exists within the D&D universe, a brief summary. To play the genre-defining roleplaying game, players must generate (or "roll") a character, writing down their details on a Character Sheet. Along with numbers representing their Strength, Dexterity, Willpower and so on, players must also choose an Alignment, representing their character's ethical and moral outlook on life.

The original edition of D&D only permitted a choice from three possible alignments: Lawful, Neutral, or Chaotic. Lawful characters respected society's rules; Chaotic characters didn't. Neutrals lay somewhere in the middle because they were boring bastards.

The D&D Basic Set, released in 1977, spiced things up with the addition of a second axis, representing the character's position on a Good-to-Evil scale. Thus, characters could be played with one of a possible nine alignments:

D&D Alignments chart

This new system allowed for characters that were Lawful but Evil (such as a tyrannical overlord), or perhaps Good while still being Chaotic (Robin Hood). The Joker is Chaotic-Evil. Superman is Lawful-Good. You get the general idea. Players were expected to roleplay their character in a manner befitting his or her alignment, and make decisions that were not at odds with it.

Still with me? Good. Now, you might be asking yourself what in the name of Gygax this has to do with A/B testing. Well, I'll tell you; I have discovered that all A/B tests, the successes and the failures, fall squarely into one or another of D&D's alignments.

Lawful-Good A/B tests

The Lawful/Good tests are the ones that got you into A/B testing in the first place, lured in with the promise of easy wins and fat profit margins. They are the blindingly obvious, stupidly successful experiments that anyone, even the CEO, should have realised needed to be done; the ones that Jared Spool or Luke Wroblewski talk about in every presentation they've ever given. Remove the twelve-pane animated carousel on the homepage. Let users buy your product without completing a five page registration form. They are the most obvious things to do, the low-hanging fruit, and you should be doing them before you do anything else.

Neutral-Good A/B tests

Sometimes there are changes that just need to be made to any website. Minor rebrands, technical refactoring, commercial obligations -- all of these and more can be a reason for changes that aren't in the service of customer satisfaction. You argue for wrapping the change in an A/B test, "just to be sure it's not hurting us or we didn't introduce any bugs." Your logic is sound, the change is made ... and somehow the results, that nobody expected to be anything other than dead neutral, come out positive. Hey, we'll take the win.

Chaotic-Good A/B tests

And then there are the mistakes. Maybe your QA Tester gives you a call to let you know that there's something broken on the page, and it looks like it could be your test that is causing it. An unclosed tag, an orphaned file you somehow forgot to push to the git repository, a typo that turned an <h1> tag into an <hq>. It doesn't matter, somehow it made it live and in front of your users. You scramble to switch off the test, but the results give you pause. Those numbers can't be right, can they? Green across the board; engagement is up, conversion is up, hell, even NPS is up! Somehow you have stumbled upon a winning strategy without even trying, although in retrospect it now all seems so obvious -- of course that area works better without a background colour; actually that icon was confusingly ambiguous, no wonder the design works better without it.

Lawful-Neutral A/B tests

Lawful/Neutral are those ideas that seem like a good idea but just never seem to achieve the positive results you were expecting. All of your user testing might have pointed to making that change; when five out of seven participants all complain about the size of your prices, it's hard to argue that's not a clear signal to make them bigger. So you do -- it's going to be an easy win, I can't believe we didn't try this sooner -- and then ... nothing. It's not that the test fails, but all of the numbers are inconclusive. Everyone agrees it's a good idea, but without proof you reluctantly (and sensibly) pull the plug.

Neutral-Neutral (or 'True Neutral') A/B tests

D&D characters in the middle of both axes are referred to as "True Neutral"; they have no strong feelings in either direction, neither towards the laws and rules of society, or their moral obligations. Animals in the D&D universe were by-and-large 'true neutrals' (at least until the 5th Edition, released in 2014), and it's probably the easiest (or laziest, if you prefer) alignment to roleplay.

Neutral/Neutral A/B tests are those you never really cared about. Someone, somewhere thought it was a good idea, it wound up on your team backlog, and eventually you got around to doing it ... and, surprise surprise, it made no difference to anything at all. What. A waste. Of time.

Chaotic-Neutral A/B tests

Sometimes an idea will come to you that is entirely divorced from the history and data surrounding your project. It doesn't fit into the habits of any of your carefully constructed personas, and nothing indicates that it is something your site or app either needs or wants. But you do it anyway -- hell, every idea deserves a chance, even if it is a little off-the-wall. You come in early or work late, since it's not officially on the team backlog, and you kinda-sorta fudge the test description a little to give it a reason for existing at all. Maybe it succeeds, maybe not; without a solid hypothesis, it's going to be hard to justify similar changes on other parts of the site.

A segment of the front of the Red Box D&D Basic Set

Lawful-Evil A/B tests

Now we're entering the realm of anti-UX and Dark Patterns. Designers who intentionally mislead their users, obscuring information or bait-and-switching their way to increased conversion. Lawful/Evil tests are run by designers who no longer use their powers for good. Misleadingly ordered options, or primary and secondary actions that switch from page to page. Colour contrast and layout used to obscure important information rather than draw attention to it. Full-page advertising takeovers. Kill 'em all.

Neutral-Evil A/B tests

The Neutral/Evil test is somewhat of an oddity on this list, as it is the only type of test not generally run by a web designer. These tests come from on-high -- the Product Manager, the VP of Sales -- and their only aim is higher conversion. They have no commitment to user satisfaction, no comprehension of user delight; these tests are designed to elicit one thing and one thing only: more clicks on that "Checkout" button. Also falling into this category are the "my wife's favourite colour is purple; let's try that for our logo" type of suggestions that every freelance designer loves to hate.

Chaotic-Evil A/B tests

Finally, we reach the bottom-right square, the Chaotic/Evil class of A/B test. This is the home of both the truly randomised trial and the Multi-armed bandit approach to A/B testing. This approach says, let's just get rid of all the designers and let the computers figure out how to make us the most money. Why pick one shade of blue when you can test 41 different shades? What? Users? They're voting with their wallets -- what could possibly go wrong?

The final battle

Facing the final boss

In life, as in D&D, you get to decide what sort of person you're going to be, and the decisions and actions you are going to take. Whether you choose the path of the light or a darker hue will influence how others view you and whether they will want to help or hinder you in your questing.

A/B testing has become a powerful option for designers wanting to reduce uncertainty in their workflow, but it can be seductively tempting to simply try changing All The Things and let statistics sort out the mess. But, though tests can tell you whether something worked or not, what they can't tell you is why, or suggest alternative approaches or iterative follow-ups. For that, you need designers motivated by a desire to make things better for their users, by making Better Things. Good designers, in both senses of the word.

The Dungeon Master for Dummies book, by Wizards of the Coast's own James Wyatt, Bill Slavicsek and Richard Baker, has this to say about alignments:

Frankly, we've found that evil alignments are better left to the monsters and villains; player character parties work out better when the characters take on good alignments or stay unaligned. Motivations for adventures come together easier, character interaction goes more smoothly, and the heroic aspects of D&D shine through in ways that just don't happen when players play evil characters.

As in D&D, so in life; choose the Good side and your motivations will be purer, interactions with your colleagues easier, and as a designer you get to be the hero rather than the villain.

And who doesn't want to be a hero?