#025: A/B Testing with Kelly Wortham from EY

Published: Dec 8, 2015

Subscribe: RSS

Subscribe: Apple Podcasts | Google Podcasts | RSS

0 Shares

We had a hypothesis that our listeners might be interested in hearing an expert on digital optimization. In this episode we test that hypothesis. Listen and learn as Kelly Wortham from EY runs circles around the lads, and brings them to an understanding of what digital testing means in 2015. In an hour so optimized it only takes 45 minutes, it’s 2015’s penultimate episode of the Digital Analytics Power hour.

People, places, and things mentioned in this episode include:

Taguchi vs. Full Factorial test design
kelly dot wortham at ey dot com (to get added to Kelly’s twice-monthly testing teleconference)

Episode Transcript

The following is a straight-up machine translation. It has not been human-reviewed or human-corrected. However, we did replace the original transcription, produced in 2017, with an updated one produced using OpenAI’s WhisperX in 2025, which, trust us, is much, much better than the original. Still, we apologize on behalf of the machines for any text that winds up being incorrect, nonsensical, or offensive. We have asked the machine to do better, but it simply responds with, “I’m sorry, Dave. I’m afraid I can’t do that.”

00:00:03.71 [Announcer]: Welcome to the Digital Analytics Power Hour. Three analytics pros and the occasional guest discussing digital analytics issues of the day. Find them on Facebook at facebook.com forward slash analytics hour. And now, the Digital Analytics Power Hour.

00:00:25.51 [Michael Helbling]: Hi everyone, welcome to the Digital Analytics Power Hour. This is episode 25. You know, in a slight paraphrase of the REM classic, everybody tests sometimes. Have you ever asked yourself the question, why not all the time? I know we have. That’s what our episode is about today. So we’re heading for testing excellence straight as the CRO flies. I’m always joined by my intrepid co-host. That would be Tim Wilson, senior partner at Analytics Demystified.

00:01:00.19 [Tim Wilson]: Is this standardized testing? Standardized, that’s right. Greatest standardized testing. No analysts left behind.

00:01:06.89 [Michael Helbling]: And of course, his sworn enemy in Ottawa, Canada, the double CEO of Napkyn and Babbage Systems, Jim Cain.

00:01:15.53 [Jim Cain]: Hi. Can you sing the whole episode, please? That was adorable.

00:01:20.85 [Tim Wilson]: I think we have you singing Jim on past ones that will never be topped. Okay.

00:01:29.48 [Michael Helbling]: To help us in this conversation, who better to lend us a hand than our guest? The vizier of variations, the Sultan of Significance, and the Nostradamus of null hypotheses. That’s right. I’m talking about the one, the only, Kelly Wortham. Welcome, Kelly.

00:01:46.82 [Kelly Wortham]: Thank you.

00:01:48.54 [Michael Helbling]: I know that I made those titles up, so it was not Kelly’s. Kelly didn’t tell us to read that. Other people, you know, other people.

00:01:57.38 [Kelly Wortham]: The money’s on the way.

00:01:58.36 [Michael Helbling]: Yeah, thank you. Let me give you some background about Kelly’s. Currently, she is the conversion rate optimization evangelist at EY in their analytics and strategic advisory practice. and she’s leading the charge on optimization there. Prior to that, she led the AB testing program at Dell, and she’s been out in the industry and in this space for 15 years. We’re very excited to get such a seasoned professional in our field and in this field specifically around AB testing conversion rate optimization. So it’s a real pleasure to have you here. So let’s dive right into this you know we can go down either question a or question be so which one do you guys want to start with now can we do both can we ask one person a one person be and have them answer simultaneously. I think that’s a good first question for Kelly how would that test work.

00:02:51.86 [Kelly Wortham]: I think tonight we’re going to make it so easy.

00:02:55.05 [Michael Helbling]: We’re going to be worried a lot about interaction effects because we’re going to be all over the place. But let’s get in there. I’d love to start by talking about the process of setting up and running a testing program. I think the world is sold on the concept of testing, right? Everyone believes in it. Everybody wants to do it. And whoever’s doing a little wants to do more. And whoever’s not doing any wants to do some. But it’s pretty rare to find organizations that are doing a really good job. So I’d like to throw that out first is what goes into building out a really good testing program and how do you, what would be the first steps you might take?

00:03:31.10 [Kelly Wortham]: Oh, good. Start with the easy questions.

00:03:33.34 [Michael Helbling]: Yeah, of course. Right over the middle.

00:03:36.35 [Kelly Wortham]: Right there, yeah, low hanging fruit. So there is no simple answer for how you do it right. There’s a lot of answers for what you shouldn’t do. But I would say that the number one thing that I see organizations that want to get into testing do wrong is that they fail to do the upfront planning and creation of just the simple questions that you need to ask. What are the business problems that we’re trying to solve? Just like you would do with any analytics problem. You want to do the same thing with your testing program. And you need to determine what is the goal of the program. What are we trying to fix? What are we trying to address? What are we trying to improve? And you need to ask all those questions to get all those answers before you determine what your program is going to do. Before you even choose what tool you’re going to use to get there. And often I see those questions being asked one, two years out once the program has stopped delivering these massive returns on those initial low-hanging fruits. That’s the first thing I think is crucial, just like it is in analytics. I think it’s crucial in testing. The second thing I think is really important is to not just focus on that, on the low hanging fruit, on fixing the things that your analytics team or maybe the business identifies, hey, this is broken. And you can see in the data it’s broken. Just fix it. You don’t necessarily need to expend your limited resources on expensive, potentially expensive, maybe testing. when you can just go fix something. So you have to determine when you should test and when you should just go and fix. And a lot of times people come out the door. I mean, there was even the book written, you know, always test, always be testing ABT, right? And I just, I think that that sets you up for failure because you know, you do have limited resources and people and traffic as well as of course finances. And if you can target those testing efforts to the areas that make sense where you actually need to get an answer because the analytics can’t give you that answer, then you have a better result at the gate and you have a longer tail for success with your program.

00:05:49.90 [Tim Wilson]: So have you seen cases where basically analytics is clicking along and they’re just kind of keeping a running log of these are the things that can’t be answered with analytics or can’t meaningfully be answered with analytics and kind of using that to build up a case to say this is not just a case but some level of focus to testing? Is that one of the approaches or no?

00:06:12.26 [Kelly Wortham]: That’s utopia, right? And I have seen it. It is rare. But it requires close alignment between analytics and testing or the possibility that even the analytics and testing are within the same organization. But there has to be this organizational consensus that analytics is to answer questions and that analytics won’t be able to answer all the questions. And that requires, honestly, for the analytics folks to be kind of honest as well, I guess, because nobody wants to say, hey, I can’t answer that question. But when they work really closely with the testing team, it’s not, I can’t answer that question, it’s, hey, I need help, right? I’m going to phone a friend. I’m going to phone the testing program and we’re going to get that answer to you together and they work at it together. So you have this beautiful backlog of, here’s all the questions that we’re asking and the analytics team, anything that they can answer, we can answer, we can go fix or go do. Anything that they can’t answer, they make a recommendation to go test. And based on the analytics, they make a recommendation of how to test it and what the hypotheses might be. and what those recipes might look like. And then you have a much stronger test design and a much better chance of getting successful results.

00:07:24.97 [Tim Wilson]: So heading right off on a tangent there, we’re talking about the analytics and the testing team. So getting started for teams that certainly analysts want to be doing testing, especially if they feel like they’re in a rut. do you see it as typically you really need to say we’re going to do testing we need to have a testing organization that is parallel to the analytics organization we need to staff that independently or can there be a dipping the toes in the water by saying we’re gonna we’re gonna get a designer from you know a little bit of support from our creative team or our agency and we’re gonna get a developer who’s kind of interested and we’re going to have an analyst kind of dip their toe in. So do you see that as kind of a, typically you have to say, look, this is a separate group that needs to be established.

00:08:15.09 [Kelly Wortham]: yes uh… i mean how many times do we see organizations willing to throw money at tools uh… software uh… but not for people ready uh… your your people are your most important resource well that’s a good thing that’s all you need to be in that we’re all always be tested right optimise lee is like just put a little snippet on that on the site and start testing you’re done that’s right it’s all good anybody can do it it’s a really good just drag it away it’s amazing Yeah, I get that a lot. I get a lot of, of course, the testing tools do their sales to, you know, the hippos, right? They go to the executives and they show them how easy this is to use and they convince the executives that this is the right tool. and that we don’t need to hire really talented people because even I could do it right and then what happens is well yes that’s true the out of box functionality is super easy to use and you know my fifth grader could probably figure it out but it doesn’t work on your site right because you know you have you have some custom implementations requirements that end up making you do that all at all manipulation javascript and okay now we need to hire developer and oh by the way you know the test designs that we start throwing at you are a little bit more complex than just you know moving things around the page and now we need a designer because you know we want to to change more than just the location of this banner and as it gets more and more complex you need more and more resources and you start barring throughout the org and it’s very disjointed and people get jealous because you’re taking their people and it just goes crazy but if you start the whole thing and you say look i need one developer i need one designer and uh… i mean analyst and i need a project manager And I’m starting with those people and it should be led by somebody who’s familiar with AB testing. Start there. You know, you can even use .5 FTE resources if you need to so that you can build a dotted line team. But don’t think that you can start testing just by saying, hey, I’m going to take this analyst as part of my digital analytics team and I’m going to give this person testing and I’m going to make

00:10:30.59 [Jim Cain]: put them over testing and everything is going to be great because that’s that’s just not enough it’s not just one thing it’s not just you know being an expert in you know one piece of the puzzle here won’t help you solve the puzzle so it is a testing expert or a conversion rate optimization professional is that one of those you hit a certain level as a digital analyst and you’re just kind of wired that way and that’s something you graduate to or or that really in your estimation, a totally separate discipline that bumps into measurement.

00:11:02.08 [Kelly Wortham]: That’s a fantastic question. Don’t encourage it. And that’s not me. That’s not me just hesitating. That’s a really good question. I think it’s a little bit of both. I think it is possible for any digital analyst, and I am making a distinction there, because there is a difference, as we all know, in the type of data that you have in the analytics and digital analytics. So I think it’s possible for any digital analyst to learn and change their mindset in a way to really fully embrace what’s possible in AB testing, multivariate testing, conversion rate optimization, whatever you want to call it. I think it’s possible to get there. It’s also possible to start there, whereas I don’t think in digital analytics that you could say that you could have an expert in digital analytics that’s also an expert in testing that’s never touched testing. In testing, you could be an expert in testing and not have a digital analytics background, which is, I think that makes it unique because it’s really more about understanding a lot of technical stuff, the implementation stuff, the backend stuff, but as long as you can read a report, a readout, a deep dive analytics, and you can understand the very tiny basic elements of statistics. Um, you know, whatever you took in, you know, staff 101 or whatever, if you got that level of understanding, you’re okay and you can really become an expert in your field. I don’t think that’s the case in digital and digital. It’s just so much broader. You have to do so much to become an expert. Does that make sense?

00:12:41.03 [Jim Cain]: I’m surprised though at the direction because again, testing to me isn’t harder than analysis, but it does seem like a level up, you know? And so I was surprised to hear you say that, It’s possible to get into a career in testing without ever becoming or having to start off as being good at working with data.

00:13:01.53 [Kelly Wortham]: I will tell you that at Dell, I had a team of absolute rock stars. I still miss every single one of them, and I want to work with them again. And I’m talking about the testing side, not the analytics that supported the testing. And on the testing side, I had three that came to us from the analytics side. that were analytics with kids that just were really loved supporting the testing team and decided that they wanted to start managing testing programs. And I had two who had absolutely no analytics background. One of them was a certified scrum master. And the other one actually came to us from IT. And, you know, he was just fascinated with the stuff that we used to ask him to do to help us make tests happen. And he, you know, came over and ended up managing some of the most important work that we had. So, you know, two people with no background, not only in analytics, but no background in digital analytics, who were fantastic and three who were. So, and all five of them, I would say, are now experts in the field of baby testing. So, it doesn’t help, sure, but I also think sometimes it can get in the way because sometimes, you know, the folks that don’t come from digital analytics Their minds are a little bit broader. They ask bigger questions sometimes because they’re not limited by what they think they can answer with data.

00:14:29.22 [Michael Helbling]: That’s interesting. Yeah, I’ve always said kind of a testing capability as part of an analyst toolkit. But I’d like your point, Kelly, that someone could go into testing and optimization specifically without necessarily building five years of competency in digital analytics. And I think that’s something a lot of organizations probably don’t think about or might miss. When thinking about how to build the programs and a lot of times you know you’ve been kind of hitting some of the skill requirements and things like that people with the real affinity and understanding of user experience design and those kinds of things can be phenomenal at proposing and ideating around really great testing opportunities. Whereas to your point, I’m just fascinated. Your last point was really good, which was we do sometimes get stuck in the data, and we know what data we can or can’t trust out of our tools, and so we tend to work from there and use that as our box to live in. So that’s really cool.

00:15:29.01 [Tim Wilson]: I’m all, I’m all excited about the scrum master thinking, you know, there’s so much that if you had an agile mind, if you, if you’re coming in with an agile mindset, that does lend itself to testing more where it, you know, yeah, it does seem like that’s kind of congruent with the concepts. Whereas sometimes with testing, it’s like, Oh, I need to run this until I have over overthinking, I guess over overthinking, like, yeah, yeah.

00:15:54.65 [Kelly Wortham]: Right, because the analyst is looking at it from a statistical perspective, and they’re saying, look, you know, you’ve only reached 85% confidence, and you need to run this test this much longer, and therefore we don’t get any answers. The person with no analytics background is going, dude, we’re not saving lives. You know, before we ran this test, it was 50-50, now it’s 85%. You know, woo-hoo, let’s make a decision and move. and it you know it’s it’s it allows you to move a lot faster it can be a little bit more risky too but you know again we’re not saving life here we’re we’re talking about very very minimal risk and sometimes i think analysts can You know, we’re kind of used to somebody asks you a question and they give you, you know, six weeks to give them an answer. And that’s not always the case with testing. They’re like, okay, you’ve got three week window, you get this much traffic, give me an answer. And the folks in the testing industry kind of get used to that. And you know, you can call it lower standard, you can call it whatever you want. But at the end of the day, you know, we’re both giving answers. They’re just, you know, different types of answers, maybe with different level of confidence. And maybe you don’t want to put as much weight on the answers with the lower level of statistical confidence. But at least it’s better than that coin toss.

00:17:06.81 [Tim Wilson]: I was actually talking to somebody within the last week where she was saying, no, you need 95% confidence. I’m like, I’m not a testing expert, but I think I’ve talked to people who are like, these are people in marketing and it’s messy. And that’s setting a really high bar where you’re going to have to collect a lot of data for a long time on a fairly substantial change. I had a professor years ago who was like, come on, we’re in the real world. We’re not just in academia. We can drop that a little bit.

00:17:39.16 [Kelly Wortham]: Right. Well, you know, this is really weird analogy, but you know, forgive me, whatever, my kids are learning about it in school. You know, they all said that with the advent of like the washing machine and the dishwasher and all these things that people wouldn’t have to do as much housework. uh… but statistically that’s not what happened people actually did more you wash your dishes more often you wash your clothes more often because it was easier to do so it actually increase the amount of work ironically what we do with data it wouldn’t with the moment we put that those statistics out there and we tell people hey you know you want to reach ninety five percent confidence And we think, oh, good, we can just leave it and go. That’s not what happens, right? People are like, oh, if I let it run a little bit longer, I’ll get to that level of confidence. And it’s ridiculous. There’s no need for it. There was an organization that I had the chance to work with since joining EY that created a algorithm to determine what their statistical confidence should be for each and every test they launched. based on the potential risk and the potential impact. And they said, look, if it’s low risk and the impact is low, then confidence can be low. If the risk is low and the impact is high, positive or negative, then we can also keep the statistical confidence kind of low because we’re not going to hurt ourselves. and there’s a chance that we can get a good answer quick and go and there’s a cost of continuing running it exactly exactly that your test that you can run right and if i have something with a risk is high now i’m going to jack up my physical confidence and and the requirements you know with it and and make that festival run longer because you know there’s there’s a reason uh… maybe it’s in the your checkout funnel or something into the the risk of being wrong is higher And I just thought that was the first time that I talked with an organization that was a little bit more laissez-faire.

00:19:42.23 [Tim Wilson]: Were they a retailer or were they a retailer?

00:19:45.78 [Jim Cain]: We can neither confirm nor deny that the company we’re discussing is Netflix.

00:19:53.88 [Michael Helbling]: nor deny this is this is classic jim he always throws us in there and then we have to put like monty python intermission music over it so you are not required to answer the question i’ll retract the question

00:20:11.45 [Tim Wilson]: That’s right. It seems like a really mature, that doesn’t seem like a concept that was a junior analyst positive. That seems like a really sharp executive that said, well, shit, let’s not treat all things the same.

00:20:27.37 [Kelly Wortham]: Yeah, it was an MIT graduate statistician who said, we don’t need to do this. And they were able to build out the model and actually prove it.

00:20:39.31 [Michael Helbling]: Tim just don’t even, Tim of course went to MIT.

00:20:44.66 [Tim Wilson]: Go beavers, go beavers.

00:20:49.61 [Michael Helbling]: He’s taking that as validation of all of his testing effort over the years. I can do it. You know, okay, so this is great. I don’t know to what extent, Kelly, you want to dive into the weeds of testing, but I think our listeners probably would get a lot of just things like, hey, what steps go into setting up a good test? Or, hey, you know, we’ve been talking about this confidence and those kinds of things. How do you decide which tests to run? I mean, those are more tactical concerns, but I think would be interesting. And again, I don’t want to Sort of ask you to like spill all the goods but at the same time i think everybody would love to hear and we can compare notes. Cause you know i only test to 95% confidence and that’s it so i just learned something. Anyway no back to back to business.

00:21:40.77 [Tim Wilson]: Well, growing on top of that, to me, one of the things is I got into testing and I’m not heavily into it now. Like the thing that I feel like people miss the most often is the number of roles, like we touched on it. But if we talk through kind of what the pieces are and why, you know, analytics is pretty low risk from a, you’re not going to break the user experience by botching a data warehouse export from Adobe analytics. Whereas testing seems like that does add complexity. So when it comes to kind of the nuts and bolts and what goes into a test, like who the people and the skills and the road, like you’d mentioned in one of your blog posts, the, you know, all of a sudden you need creative approval because yes, some of your potential consumers are seeing a different experience. You know, analytics doesn’t have to worry about that.

00:22:29.06 [Kelly Wortham]: right yeah i mean it yet of course all depends on the the complexity of the test you’re doing but i have you know the most complex test we’ve run you know we’ve had to get approvals and and insights and ideas from you know the pricing team the legal team uh… the brand team for sure business owners uh… the design team ux because they’re not always the thing and then of course you’re you’ve got the analytics uh… the developers and the q a team and all of those people. In my experience, the best test managers are cat herders. And they really are spending all of their time getting all of those different pieces and all of those different people aligned about the goals of the test and, you know, keeping everybody in lock step to move things forward. And again, it totally depends on the complexity of the test, but you know, if you do have to get all of those different people to touch the test at one stage or the other, it makes for a much more complex upfront development before you can even get the test launched. But it also, on the back end, as you’re analyzing it and you want to give a readout, you have a whole bunch of egos in the room that have opinions, that have very strong opinions probably, and some of them you’re going to have to tell them they’re babies ugly, and it’s not easy to do. So not only do you have to be a cat herder, you have to be a diplomat. So it’s a very unique position in my experience anyway in the industry.

00:24:06.01 [Jim Cain]: So a bar manager would probably make a good testing team.

00:24:09.60 [Kelly Wortham]: Yes.

00:24:10.72 [Tim Wilson]: Yes. That is interesting. He scores. Harkening back to episode two. So how do you run into, do you actually shy away from tests that are copy changes that would require legal? The legal’s one I haven’t run into because the tests I’ve never kind of done offer or pricing or that sort of testing. Is that something you regularly see on teams that there is a legal review for any tests or for certain types of tests?

00:24:40.37 [Kelly Wortham]: So, um, some of my clients are in, you know, the, um, um, financial services or financial service, exactly financial services, healthcare, pharmaceutical, you know, those types of organizations and everything they do goes through legal, which makes it very, very challenging and dramatically limits the amount of testing that they actually are able to do. You asked an interesting question though, which was, would you avoid, you know, if you had the opportunity to run a test, Say you’re, you’re, you know, like, like I was back at Dell, obviously not everything had to run through legal, but some of the most important tests we did, did have to run through legal. And so you ask, would I avoid them? And the answer is an emphatic, no, you know, I believe you need to divide your traffic into sort of a strategic lane that, that it saved for those big important tests that may have a longer upfront development time and run time. And then another, you know, the majority of your traffic, depending on how much traffic you have, maybe 75, 80% of your traffic, you’re running those continuous optimization operational type tests. But, you know, pricing elasticity is an amazing test to run. Offer type can be incredibly powerful. But so can, you know, subscription rate offers, not just offers, but, you know, different ways of illustrating the same offers. And all of that is going to require, you know, pricing and legal and just have a lot more hoops jump through.

00:26:10.78 [Jim Cain]: I’d like to explicitly call out something that’s happening right now for, unless I’m wrong and then tell me I’m full of shit, but.

00:26:16.55 [Kelly Wortham]: Sure.

00:26:17.63 [Jim Cain]: Cool. You’ll be, you’ll not be the first person to do it to me probably today. So a lot of people who are more junior like newer members of my team, when we talk about testing to them, it’s, red button versus green button. Copy A versus copy B and the conversation that you and Tim are having is much more sophisticated than that. So every aspect of a user’s web experience can be tested. And so you’re talking about price and you’re talking about copy and you’re talking about offer, like it’s not red button versus blue button. Could you make a list of, you know, I want to get into testing and wrap my head around it? What are the key types of things that are going to be tested and hopefully, uh, touch on behavioral targeting as well? Cause that’s my favorite.

00:27:00.32 [Kelly Wortham]: Sure. So, um, I’m great. I’m going to get there and don’t bring me back if I get too far off. But I wanted to say one thing because before we got off onto the last discussion, you asked one of the original questions of what do you need to do to have a successful test, which I think fits nicely with this question that you just asked. And I wanted to answer that I, that I really feel passionately that the number one thing you need to do when you’re thinking of your task, regardless of whether it’s placing the last 50 or even button color is to determine what it will mean if your results are X, Y or Z. So you’ve already, you’ve gone through the effort of saying this is the question, this is the goal, you know, whatever. and this is our hypothesis and here’s our different recipes now you have you know hopefully you have a kpi and you’ve said you know this is the the main thing we’re going to focus on and you may even have some secondary uh… metrics that you want to make sure they don’t go down dramatically right but you’re just going to focus on that kpi and you’re going to pay if b is higher in uh… maybe conversion rate than a it means this and we will take this action C is greater than A, but not B. It means this, and we will do this. So you create this huge long matrix that says all the different possible results of your test. And the process of doing that, you will learn very quickly that you don’t need A, B, C, and D, or maybe you need E and F as well. Because you haven’t been able to, you haven’t set up your test in a way so that each recipe gives you a different answer. And if you don’t do that, then you’ll have a result, but you won’t know why. And if you don’t know why, you can’t act. And then you’ve just wasted everyone’s time and resources. Did I explain that okay? Does it make sense?

00:28:59.78 [Tim Wilson]: I have speechless. You’re just music too. Well, because what you’re saying is, hey, before you go and do a bunch of work, why don’t you think through exactly how it could play out instead of saying, hey, let’s do the work and we assume at the end of the day there’s going to be a magic answer and a clear way forward. It’s saying it’s a hell of a lot cheaper. to sit with a whiteboard and kind of play out the reasonable set of scenarios and saying, is this going to make us behave differently in a meaningful way tomorrow than we do today? Right. So I am all for, because it does. It’s the evil cousin of technology being just the solution is, oh just do stuff and then expect that just envision that at the end of the day, the answer is going to roll out, and it’s going to be crystal clear, and it’s going to point a clear way forward, as opposed to saying, how about if we just try to draw a map and see where this might go?

00:29:57.64 [Kelly Wortham]: I think it was two or three exchanges ago. Lynn Lanphere from Best Buy said, is the juice worth a squeeze? And I will never forget that phrase. It has totally changed the way I respond to every business request that comes my way. It’s not just, you know, it’s just going to be worth our time and effort. Are you going to make a decision based on this information? And if you’re not, we don’t need this recipe.

00:30:28.60 [Michael Helbling]: Yeah. And depending on the organization, that answer could be different. So it’s, it’s not always the most obvious thing. And it also puts to bed kind of that concept that you hear sometimes like, there’s no such thing as a failed test, just learnings. And it’s like, sort of like, just get out there and run tests willy nilly. And it’s better to have a plan. Definitely see the end.

00:30:52.39 [Kelly Wortham]: It is. it is, and you know, there are a lot of purists out there that do believe passionately that the purpose of testing is learning.

00:31:00.04 [Tim Wilson]: Can you give us some names specifically?

00:31:03.29 [Kelly Wortham]: Yeah, yeah, let me just start wrapping up with you. I think that’s the purpose of analytics, right? Analytics helps you learn. Optimization lets you do. And if you already have analytics to learn, You don’t need your optimization program to learn to. You just need your optimization programs to do. And sometimes your optimization program can provide evidence to support something you want to do, to fight the good battle, to stand up against team X or Y at the organization. And in that case, it really is a learning, but it’s a learning with an intended action. whether or not you achieve it or not is, you know, up to the organization. But I think all tests should put on the glasses that the guy’s been tense of what action are we going to take based on the piece result.

00:32:01.57 [Michael Helbling]: I love that. But let’s get to the question that I think everyone listening has been waiting for up until now. Taguchi or full factorial?

00:32:10.20 [Kelly Wortham]: No, I’m just kidding.

00:32:15.12 [Kelly Wortham]: I am deeply, deeply sorry for all of my Adobe friends and families, but I am not a fan of Taguchi. I am MVT only, full factorial. And in fact, if you think back to my description of what I think you should do with every test as you’re designing your ABCDES, I’m actually describing a full factorial AB test.

00:32:38.32 [Michael Helbling]: What I love about that right there is that that was actually a joke, but you actually, that was a very insightful answer. I was like, wow, that’s pretty good. Usually that question comes up when somebody in the room wants to seem smart and has no idea. That’s typically what I see brought to bear.

00:33:08.16 [Kelly Wortham]: Let me answer your other question about tests, about button tests or what tests Or how do you know it’s a good test? Or what test should you test?

00:33:15.69 [Tim Wilson]: That was one of Jim’s questions. We just kind of humor him. We just kind of try to move on after he’s asked a question. But if you want to be that kind of guest, then go for it.

00:33:23.44 [Kelly Wortham]: No, I think it’s a good question. Because in my experience, organizations that are new to testing, that’s the first thing they do. They’re like, oh, well, let’s check our calls to action. Or let’s try this banner or that banner. Or let’s use behavioral targeting to determine what we’re going to put on the banner. And it’s not that those are bad. It’s that you’re not asking the right question. Why do you want to change the button color? What evidence do you have? What data do you have? What voice of customer research have you done? that says that the color you have is not good. So if you can come back to me and you can say, look, we’ve looked at template A, B, and C on our site, and they use different calls to action. And we noticed that in template C, people click on the call to action more. And when we look at the difference, version C has a different color button. But it’s also just higher contrast versus the rest of the site. So we think it’s drawing more attention, more eyeballs to the call to action, and people are more likely to click. So it’s not that we want to test changing the colors of our calls to action buttons. We want to jack up the contrast in some way, and we think the best way to do that is to change the color. And by doing this, we are testing the theory that higher contrast calls to action will drive more click through. now if you run that test and it’s the same as a button color test but on the other side of the test you haven’t learned that blue is better than green which is a bunch of bs as we all know instead you’ve learned why it’s better you’ve learned that high contrast is better than low contrast or maybe you’ve learned that the you know all caps is better than not all capture whatever the situation is to to draw the eyes and it’s Because the truth of the matter is, I will tell you this every single time, every organization that I have ever done any testing with. has run a button test. It has won. Because change draws eyes. It’s simple, right? So now you need to, instead of doing that, you need to have that business case upfront to design a better hypothesis and better recipes. And then, you know, you need to know upfront, what is your results matrix? What does it mean if X, Y, or Z happen?

00:35:45.13 [Michael Helbling]: And that’s why you get teams saying, we’ve used A-B testing to improve our web results by 4,000%. And then you’ll look at their main metrics, and they’re the exact same as they were before, is because those kinds of changes regress right away to the mean. It’s just that initial look and feel. And that’s really key. That’s great. All right, so as much as I really am not liking that we’re getting to this moment, we really have to start to wrap up. And before we kind of do a round robin and just sort of talk about, you know, interesting things we heard or parting shots, whatever, Kelly, I am so glad you came on to the show. I think this is such a great topic. And, you know, I think I speak for all of us. You covered it with such a plumb. And, you know, your deep expertise is very evident. So I think our listeners are going to have probably lots of great questions. So obviously you should probably follow Kelly Wortham on her Twitter account at Kelly Wortham, which will probably then force her to tweet more. And of course, you know, you’ll get that big digital analytics power hour Twitter bump, you know, how that happens, you know, or no, I’m just kidding. No, anyways, let’s let’s do around Robin and do some closing thoughts on this show. I feel like we’ve just scratched the surface though. So we need to do this. We need to do probably two more hours together. And I will bring some examples next time and we can pick them apart.

00:37:16.27 [Tim Wilson]: I feel like we’ve been saying we need to talk about testing from like episode like five onward. And I think we’ve we’ve proved that it could almost be every other episode. So I think scratch the surface is a good way to put it. I’ve also learned that maybe if I shut up and let Jim ask a question once or twice, that they’ll actually be labeled good questions with at least get insightful responses, even if the questions are still horrible.

00:37:44.10 [Jim Cain]: Kelly and I have been practicing that question all week. I’m just glad I got it.

00:37:50.95 [Tim Wilson]: Yeah, this has been, I feel like we bounced around a lot and every time Kelly was talking, I had like, wow, that’s a really good point. That’s a really good point. That’s a really good point. And I have seven other questions. So yeah, I feel like it’s on the one hand, you could walk away from this being really intimidated by testing, which is probably not our intention. At the other hand, on the other hand, it’s going in with eyes wide open that, you know, don’t buy, you know, if the tool says, just get the tool and the rest will come. Well, that’s a load of crap. There’s a lot to be, a lot of thought needs to go into it. And yeah, my biggest, I say take away because I feel like it’s just because it’s such music to my ears, the Think about what you’re doing before you, you know, just do it. It’s not Nike from a just go execute and good things will come. It’s actually makes sense to put some thought and planning into what you’re doing, why you’re doing it, you know, who you need to help you do it. And don’t shy away from stuff just because it’s going to require legal approval. So it could be high impact and be worth the worth the investment.

00:38:56.10 [Kelly Wortham]: which is not to say that Nike doesn’t have an excellent testing program.

00:39:03.95 [Tim Wilson]: They do. It was, it was, it was more a reference to their outbound consumer slogan, not to the company.

00:39:13.98 [Jim Cain]: So this is a fun chat. And I, to me, that’s really all it was this time. And I think we need, like Tim was saying, we need another two hours. We did a really good job building a baseline today for people who are new to testing. I think the only particular takeaway that I’d share, because I’d never thought of it before today, is that we talk often about how hard it is to find an analytics professional. You know, like I’m a recruiter or I’m a company that wants an analytics person. Sure. I wonder how much harder it is to turn over a rock and find a testing person.

00:39:46.32 [Kelly Wortham]: Lighting in a jar.

00:39:47.31 [Michael Helbling]: Yeah. You either build them or they’re not a lot. Yeah, definitely not a lot.

00:39:53.89 [Jim Cain]: Yeah, very challenging. Those are my big takeaways. This was fun. Let’s do some more.

00:39:59.20 [Michael Helbling]: Yeah, and for me, you know, I think definitely one of the things I’m taking away is thinking a little bit harder about how I huge to kind of the doctrine of have to have this confidence level. You know, I kind of joked about it, but typically we’ll always look for a specific confidence and haven’t historically kind of looked at it from the perspective of impact or complexity or, you know, so those kinds of things in terms of how we, how you might be able to like raise or lower confidence. So I think that’s really, I thought that was something that was kind of new for me. So that was, that was awesome. And yeah, I reiterate a third time, you know, just how little I feel like we were able to cover and so, but that just means we need to do more of the same. So that’s good. And I don’t know, Kelly, any parting thoughts from you or.

00:40:49.25 [Kelly Wortham]: Well, it’s, it’s a little ironic. I feel almost like you guys all set me up for this and you don’t even know it, but I have a twice monthly teleconference. group that discusses, you know, just one piece of the topics that we covered tonight each time in an hour long conversation. And all of you are, of course, invited to join us as all any of your listeners. Just shoot me an email at kelly.wortham at ey.com and I’ll add you to the invite. For those of you who have been to exchange, the whole idea was to take the exchange huddle session and pull it into a teleconference. So you have people from all over the industry from all different areas of different verticals and different levels of sophistication. Some people just building programs. Some people I would call best in class and they all get on the phone together and we just, you know, pick a topic and kind of do a deep dive on it and learn from each other. And it’s really powerful. And as you, as you mentioned, you know, It feels like you just scratched the surface in this 30-minute conversation. It’s really nice to have a full hour to just pick one topic and dive deep.

00:42:05.15 [Jim Cain]: Did you actually invite the three of us to crash your nice conference call?

00:42:09.24 [Kelly Wortham]: I absolutely did. The three of you and all your listeners.

00:42:12.98 [Michael Helbling]: That is awesome. Oh, that is such a great resource. And I think you will probably get a few emails from folks and probably from me also wanting to get a chance to listen in on the next one of those. And so if you’re listening, that is something for definitely rewind that part, get Kelly’s email. And I think you said it was kelly.wortham at ey.com.

00:42:38.55 [Tim Wilson]: That’s right. Now I’ll be who we wanted. Michael just restated it for you. So now you need to not rewind.

00:42:45.74 [Michael Helbling]: Just one more time. So crazy how fast the time went tonight. I don’t remember recently one of these going so quickly. Anyway, this is great. Again, if you’re listening and you want to get more information or follow up or have questions, I think we all know who you need to talk to or who’s twice monthly teleconference you need to join. Definitely come to our Facebook page and tell us how that went. Yeah, exactly. No, definitely hit us up on our Facebook page or Twitter or on the Measure Slack and we’ll do our best to get Kelly involved with Measure Slack and stuff like that. Although she seems pretty busy based on just even how long it took us to get her to get on the show. Part of that was our fault. Man, was it worth the wait? So Kelly, thank you once again for everyone listening. We’d love to hear from you and get out there, formulate a great plan and then go test against that plan and really make things happen. Don’t just learn stuff. All right. Well, obviously for my co-hosts, Tim Wilson and Jim Cain. This is Michael Helbling. Keep analyzing.

00:44:10.41 [Announcer]: Thanks for listening and don’t forget to join the conversation on Facebook or Twitter. We welcome your comments and questions. Facebook.com forward slash analytics hour or at analytics hour on Twitter.

00:44:24.98 [Michael Helbling]: And what you’ll need to do is just interrupt him whenever you need to talk. Jim McKean is the person who then goes after Tim Wilson for talking too much, so I don’t have to. That’s kind of his role and function on the show. If you get a loonie and a toonie, you got some money.

00:44:49.22 [Jim Cain]: Holy job, eh?

00:44:50.64 [Tim Wilson]: Hey. I’m getting like the massive head shake from Hellboying. It’s like Tim would cut my whole section out. What’d I do?

00:45:01.45 [Jim Cain]: Search Discovery is a great place where I feel free to express myself. I neither confirmed nor denied it. I think we are in a defensible position, man.

00:45:13.18 [Tim Wilson]: That was a good one. Rock flag into Gucci.

Podcast: Download | Embed

Subscribe: RSS

3 Responses

#089: Bringing Statistics to Digital Analytics Data with Matt Policastro - The Digital Analytics Power Hour: Data and Analytics Podcast says:

May 22, 2018 at 12:36 AM

[…] #025: A/B Testing with Kelly Wortham […]

Reply
#158: The Evolution of Testing & Optimization: Looking Back and Looking Forward with Ton Wesseling - The Digital Analytics Power Hour: Data and Analytics Podcast says:

January 12, 2021 at 12:35 AM

[…] (Podcast) DAPH Episode #025: A/B Testing with Kelly Wortham […]

Reply
#090: A New Paradigm for Privacy? with Sergio Maldonado - The Digital Analytics Power Hour: Data and Analytics Podcast says:

July 12, 2021 at 10:50 AM

[…] #025: A/B Testing with Kelly Wortham […]

Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Have an Idea for an Upcoming Episode?

SUBMIT IT HERE

Recent Episodes

#277: ANOVA? I Hardly Know Ya’! with Chelsea Parlett-Pelleriti

August 5, 2025

https://media.blubrry.com/the_digital_analytics_power/traffic.libsyn.com/forcedn/analyticshour/APH_-_Episode_277_-_ANOVA_I_Hardly_Know_Ya_with_Chelsea_Parlett-Pelleriti.mp3Podcast: Download | EmbedSubscribe: RSSTweetShareShareEmail0 Shares