Listen. Really. That’s what you can do. You can listen to this episode and find out what you learn. Or you can NOT listen to the show and NOT find out what you learn. You can’t do both, which means that, one way or the other, you WILL be creating your very own counterfactual! That, dear listener, is a fundamental concept when it comes to causal inference. Smart analysts and data scientists the world over are excited about the subject, because it provides a means of thinking and application techniques for actually getting to causality. Bradley Fay from DraftKings is one of those smart data scientists, so the gang sat down with him to discuss the subject!
00:04 Announcer: Welcome to Digital Analytics Power Hour. Tim, Michael, Moe, and the occasional guest discussing digital analytics issues of the day. Find them on Facebook at Facebook.com/AnalyticsHour, and their website analyticshour.io. And now the Digital Analytics Power Hour.
00:27 Michael Helbling: Hi, everyone. Welcome to the Digital Analytics Power Hour. This is episode 120. Warning: This show is going to get really fucking scientific. If you were looking for our regular humor and banter, do not listen. They tell me we’re going to talk about something called casual interference. What is casual interference, you might ask? Well, apparently, you might be just minding your own business, running an AB test, which is totally scientific of you, and you could be confounded by some sort of statistical L’s or something, so your test may be invalidated by the casual interference of these L’s. They might find your test in what we call the statistical realm and just mess with it just to ruin your day. Hey Tim, you’re something of an expert on these L’s, are you not?
01:17 Tim Wilson: Well, Michael, I think it’s actually…
01:19 MH: Okay.
01:19 TW: Causal…
01:20 MH: Just before you get going, okay, yep. Moe, have you ever had a casual interference mess with your tests?
01:27 Moe Kiss: I kinda think Tim’s right on this one.
01:29 MH: Okay, hold on. Hold on, Moe, I’m getting something. What’s that? No. They’re not L’s. Oh, oh, it’s not… It’s causal inference, not casual interference. Well, what am I supposed to do? We’re not prepared for that. Hey, Bradley, do you think you could cover causal inference instead?
01:49 Bradley Fay: Yeah, I think I can cover that.
01:50 MH: Okay, okay, cool. Okay. Bradley Fay is the Senior Manager of Analytics at DraftKings. He’s also held data science roles at other companies, like Wayfair. He has his PhD in marketing from Arizona State University, and he is our guest to talk about causal inference. Welcome to the show, Bradley.
02:10 BF: Thanks, everyone.
02:11 MH: Awesome. Okay, woo. We passed a bullet there, I’m so glad you cover both of those topics equally.
02:17 BF: A man of many talents.
02:21 MH: Anyway, so, maybe it could be confusing off the top, but I think that’s why you’re here, so maybe just give us a really quick overview of causal inference as sort of… Let’s get started in the conversation, and then Tim, I’m sure, you’ll help take it from there.
02:39 BF: Yeah, so causal inference is, it’s basically the process of going through and trying to establish some sort of causal relationship, right? So we wanna know if we move lever A, like how much does that actually change outcome B? And there’s a lot of techniques in logic and reasoning that goes behind it, and causal inference is basically what captures all of that and helps us really kind of establish that these relationships exist.
03:02 TW: So AB testing, in the case of…
03:06 BF: Yeah.
03:06 TW: A website, definitely that’s kind of the case of… Very, very hard to AB test in media, but it sounds like that even just doing a straight up, simple split test on a website, causal inference kinda comes into play, or can come into play there, as well?
03:23 BF: Yeah. I mean, that’s the whole reason why you run an AB test, right, is ’cause you’re trying to establish that like if by changing some aspect of a website, right? So if you change your button color from red to blue, that you can establish like a meaningful and persistent change in your outcome, whether it be like conversions or clicks or visits, or whatever it is, right? So AB test is just like a very narrow type of causal inference.
03:44 MK: But so why the term inference on the end? I mean, like…
03:48 BF: Yeah, no, no, no, that’s a great question. I think it’s one of those… The real challenge is you can never establish true causality, right? Like you can’t prove truth, and so the best you can do is actually just infer it through some sort of logical deduction, right? So I think the reason why you have the inference in there is to kinda say it’s not quite true, but it’s the best we can assume under all the right conditions.
04:09 MK: So when you’re talking to stakeholders, do you say causal inference or…
04:14 BF: No.
04:14 MK: Okay, so how… I guess…
04:18 BF: Not at all.
04:18 TW: That’s why you cc. It’s data science conferences, you get to say causal inference, right?
04:23 BF: Yeah. Yeah. I mean, it sounds fancy when you talk to a stakeholders, right? So if you wanna say, whatever you’re doing is like really cool, you could say causal inference, but…
04:32 BF: No, no, I think whenever you’re talking about it with stakeholders, you’re trying to say, you know, it’s most important you try and frame it in the question you’re trying to help them answer, and we’re really just trying to say, “We’re trying to help you answer the question of, if you move or make a specific decision, like here’s how much improvement, or how much cost we expect to happen as a result of that,” right? So like an example is, if I change my bid on Facebook, ’cause I’m bidding for advertising, like I wanna have some idea of exactly how much more I’m gonna spend, exactly how many more clicks I’m gonna get, because that sensitivity is gonna tell you how much you should actually change your bid, right? Like you don’t wanna just willy-nilly, just say like, “Oh, I’m gonna pay a $3 CPA, and that seems reasonable to pay 10,” like that’s kind of unfounded. We wanna make sure that we can like establish why do we wanna pay 10.
05:19 TW: But so there’s… I mean, essentially you say Facebook, or take Facebook or Google, you know, the big media players, which are both kinda coming out and saying, “Just kind of tell us what you want, tell us who you wanna target, set it up right, and we will kind of auto-optimize with the magic of machine learning.” Discuss.
05:44 BF: So yeah. So, machine learning’s great and optimization is good, but at the end of the day, they’re trying to hit a target under a bunch of constraints, and that’s not necessarily what, as a business, you wanna do, right? So the way I like to frame it is like they have machinery that works in a certain way, and it’s black boxing to a point, and we don’t have any control over that. What we do have control over is the decision of what that target bid is, and like marketing, as you guys are familiar, like, no one really knows how much marketing’s worth, right? Attribution modeling’s like a super hand wavy thing.
06:14 TW: Yep.
06:15 BF: So what we wanna get at is, can you do a little bit of this? Can you get at building a better model or building a better mouse trap to set that bid so you don’t have to rely on third parties? They’re gonna do what they’re gonna do but you don’t wanna rely on them to do everything. They set it and forget it. At least, that’s my mentality it’s set it and forget it for, some sites is a little bit scary because things change over time and your goals change over time and those algos don’t necessarily update how they should.
06:42 TW: So with causal inference, you’re actually… Because there’s inference happening, you’re trying to get to some deeper understanding of the cause. There’s actual learning there, as opposed to if it’s just machine learning or it’s some neural network that’s just cranking away there which the knock against the deep learning and neural networks is, yet nobody knows the what and the why, it is completely black boxy. So, causal inference sort of, with that mindset, it sort of forces you to have some thinking upfront and then making sure that you’re doing kind of a design of an experiment and that might not be the right term to actually say, “Does it seem likely that this hypothesis that I have is holding up?”
07:33 BF: So the way I think that it’s like machine learning and neural nets and all that stuff is really what they’re trying to do is fit statistical distribution. So the outcome variable follow some random distribution. So if you’re trying to classify pictures of dogs or cats, it’s a binomial. Some probability that it’s a dog and some probability that it’s a cat. And it’s trying to use data to just do that. It doesn’t really care why it’s a dog or why it’s a cat, it just wants to make that prediction. When we’re getting into causal inference, the real thing we’re trying to answer is why. Why is this outcome change whenever we turn some knob up? And so, what it really takes I think, is this is where they talk about content area experts. It takes contextual knowledge to really understand or even theorize what are these possible relationships that could cause.
08:18 BF: We know it’s pretty reasonable that if I change my bid on an ad platform, if I increase it, I should see an increase in spend. I should also see a corresponding increase in clicks. Those are pretty easy. Where it gets kind of a little bit nuanced or where you want a little bit more content area knowledge is when I start to change things on a website, why do I think that should cause that outcome? And then can we actually go through and establish that that does? Did that process actually changes our conversion rate or whatever it may be that we’re concerned about?
08:48 MK: I keep going through Daniel Kahneman’s book again, which I know I’ve talked about a lot recently. But he goes on and on and on and he says this quote like a thousand times in it, “All you see is all there is.” And I’m thinking about this in this context because what worries me is how could you… You’re trying to think of all the causes for some particular uplift in something because you’ve changed a particular knob or button here or there. I suppose what I’m trying to think about is how can you be sure that that is actually causing or what you think is causing it is… Do you know what I’m trying to say?
09:23 BF: Oh, I totally get it. You’re trying to say there’s a bunch of moving parts and a bunch of reasons why performance could increase or decrease. There’s macro level factors, there’s micro level factors. It could just be the person that sees the ad has a good day or had bad day. They just all of a sudden, puppies all of a sudden are just more friendly to them. I think their challenge is, you need to understand and have those in your mind, but that’s where when you start running regressions and stuff like that, you just… All that stuff falls in the error term. And this is where the whole of causal inference is about how do you deal with these unknowns and unobserved variables and things like that. Is there a way to actually establish? What parts can we observe? And given that, can we observe them, are we comfortable saying at least a portion of this is causal?
10:04 TW: So as I’ve been sort of talking a lot about an analyst sort of transitioning and doing some aspects of data science and the Venn diagram that I use the most often it says, “What is data science?” And it’s one bubble is either statistics or modeling, whatever you wanna call that. And then there’s kind of a programming or computer science piece, and then the third part is domain expertise or subject matter expertise. I wind up, usually, if I’m speaking at a marketing or analytics conference saying, “Look, we have this. This is the stuff that if you’ve been doing digital marketing for five or 10 years, you’ve got a lot of subject matter expertise,” and then I kinda move on.
10:42 TW: Whereas, if I understood what you were saying, it’s like, “Well no, no, bring that back up ’cause this is really the reason that you have to think really hard.” I’ll say that. I’ll say if somebody comes straight out of school and they’ve got a… They’re a data scientist. Chances are they have the programming, the computer science, the statistics, the modeling, but they can’t possibly have real depth of knowledge in marketing or financial services or manufacturing or whatever it is that is not taught. And that’s the third kind of leg of the stool.
11:16 BF: Yeah, I think that aspect gets underrated. I think in data science, the belief is if you can feed a machine learning model, you can solve a lot and like… You can, to a point, but the real power comes in understanding the context in which you’re working in. You guys have been in marketing for a long time, and obviously, I spent a lot of time studying it, I still don’t feel like I’m necessarily an expert in marketing. There’s a ton of consumer behavior and different aspects to go into understanding how people make choices. And at the end of the day, you could only explain so much because as much as we’d love for there to be this perfect model predict, if I serve this person an ad at this point in time, it’s totally gonna create a conversion. It’s not true, people are just generally weird.
11:58 BF: It’s super unpredictable. But it’s kind of funny. I’ll sit in a room and I’m supposed to be the analytic or the quant guy and I have to sit in there and say, “No.” People are weird. There’s a limitation on how much we can actually do. And let’s just be a little bit rational about this, but that’s where the content area I think comes in and it’s easy to assume that it makes sense to people but I think that’s a strong assumption and it oftentimes breaks down.
12:24 MK: I talked to Peter Fader, who I totally love, who’s been on the show before, has a quote which I often come back to, which is that you need to embrace the randomness of customer behavior because… There is some weird customer that’s gonna check the status of his order 180 times in one day. And it doesn’t matter how much analysis you do, there is something crazy going on there that you’re never gonna be able to explain.
12:48 TW: Well, but also, that even in both fronts, we’ve talked about this before as well, kind of the data literacy, the wanting the truth, and Bradley, you sort of set it up at the beginning, its inference because you’re never gonna prove the truth. You’re going to have a probability of something. And even that, because we’re talking about consumers, it’s not gonna apply to every one of them, right? To Moe, to your point that you’re talking about populations, and saying that they generally will behave this way, but that doesn’t mean there won’t be many, many, many exceptions to the rule. And that, I guess, get’s to you threw, you dropped in the error term a little bit earlier.
13:26 BF: Yeah.
13:27 TW: And actually, I just got a lecture about, “Well, that’s what your error term is,” from ROs. I was like, “Okay, it was in my equation, but I wasn’t really sure what I was.” Can you talk about that a little bit?
13:37 BF: Yeah, so I’m gonna touch on the population level, and then I’ll get back to the error term.
13:42 TW: Okay.
13:43 BF: So, the notion of looking at lift at the population level, the reason why we have to do inference of that is because people are generally weird, right? And we’re looking for on average, and I think that’s often lost. And whenever people do, AB test or any kind of online experiments is what you’re really solving for is, what’s the population average? There’s, you can get into more specific of how much did I affect Person A or Person B, but that’s one where you, you can never observe the truth, so it makes it really, really hard. But then, how that kind of rolls into the error term is whenever you run a regression, so how I was taught, from an econometric standpoint is like you have your observed and unobserved variables, and you work, when you fit a regression, under the assumption that whatever you can observe, so, whatever your X variables are. If you could be an omniscient econometrician, they always say, unobserved to the econometrician, that you could actually put in all of the other variables you need to where you essentially have a deterministic equations. But the reality is, you can’t, right? So, you don’t know what someone’s emotions are in a given day, you don’t know what the weather is necessarily somewhere halfway across the world, I guess, you can figure it out. But…
14:46 TW: Yeah [laughter]
14:48 BF: Those things, they just explain such a small part of the variance, it’s not even worth the effort. So you just kind of assume that away and you just say, cool, the unexplained part of the variable that I’m trying to predict that’s just the error, but that error is composed of all of the things that I can’t observe, and all of these complex processes that are going on.
15:08 TW: So, error and noise are… Are those cousins?
15:14 BF: Sort of. So, it depends, I think it depends on the school you come from. So from my training, the belief is that there is no noise if you have perfect information. There’s no such thing as noise. I think it’s, if you just accept that there’s not perfect the information, then you can kind of write it off to noise.
15:32 TW: Okay.
15:32 MK: So what is your training then?
15:34 BF: So my background was in marketing, but it’s really a hybrid of psychology and economics.
15:40 MK: Okay. And the belief is that you never have perfect information. I feel like that’s somewhat liberating.
15:47 BF: If you can accept it. Yeah.
15:50 MH: Yeah, it requires you to grab on to that uncertainty a little bit, and just be willing to ride it, and know that kind of the stuff you’re doing is not precise or not necessarily perfect.
16:05 BF: Yeah, right. It’s getting away from deterministic modeling, right? I cannot perfectly predict any outcome. That’s just how I kind of approach everything, and that’s super frame. Because then it’s just like, “Well, how close can I get within reason?”
16:18 TW: Yeah. Which it may be… It maybe back to AB testing? And I had this epiphany six months or so ago, that when you run an AB Test, your sample is for the period of time that you have an A and a B. You’re trying to predict the future, the further into the future you go, the more of the variables that are not accounted, the more things will have changed. The economy, competitors, the season, all that other stuff. So, I feel like the CRO world is getting better at many things, and digital sort of realize they oversold themselves as being perfect. Optimizely, kind of is the captains of that, right? They came out and they said, “Look, just run an AB test, everything’s… It’s perfect. It removes all uncertainty, it’s unbiased, it tells you everything.”
17:11 TW: And then they’ve kind of been slowly backtracking for eight years. And saying, “Well, but this, but that.” And I think the other decide is that that’s great. We can’t run AB tests many, many, many, many times. To me, my understanding from when I saw your talk was, “Well, yeah, that’s because AB testing is just in one narrow view, but you have these other things.” And if you take this mindset of, the mindset of, “How am I trying to figure out what’s going on?” It gets a lot broader. But then, it also comes back to AB testing.
17:47 BF: So I think the way to think about AB testing is, when you run an experiment, you’re doing that to generate data, to then analyze. So whenever you’re doing, the whole crux of causal inference is this notion of the counterfactual. And so really, what you’re saying is, “Here’s the truth,” or here’s… “My hypothesis is that if I change this button to green, I think I’ll see this outcome.” But what you really have to be able to observe, to establish that that’s true is what would have happened had I not actually changed the button to green, right? So what’s the counterfactual in that situation? And so, AB testing is just a way to get it to counterfactual. We can say, “My hypothesis is change the button.” or serving an ad, or whatever it is, the only reason you need an AB test is to generate the data. But if there’s other ways to establish what that alternative is, if there’s other data where it already exists, then you don’t actually need an AB test. There’s other techniques that you can use to get there.
18:40 MK: So I actually wanna touch on that, because I had this exact situation where we ran an AB test, I think it ended up being six months, and the only reason we kept running it, because basically, people were like, “What if the variant doesn’t continue to outperform the control?” And we’ll only know that if we keep a control. So that we can continue to compare them. So, I’m interested to hear how you would tackle that.
19:02 BF: That seems super expensive, super expensive, especially if you have a big effect. If the test condition has a substantial, a meaningful lift over the control.
19:12 MK: Yeah.
19:13 BF: And you believe that, you make some assumption that, “Okay, I observed this in this one period and I think it’s reasonable that it’ll continue through.” The longer you keep that control, the more money you’re leaving on the table, and that’s a super expensive… It’s like a really hidden cost…
19:27 MK: Yeah.
19:27 BF: But it becomes a really, really expensive test.
19:29 TW: But isn’t that, and this is gonna get me in trouble, if not with you, then certainly with Matt Korshoff but the bandit… The idea of a multi-armed bandit would be that you would say… Well, I kinda keep my control but I’ll scale it way down so that it’s, I’ll kinda shift my traffic so that I’m kind of keeping it running in case and it can kind of bubble its way back up, but it seems like the bandit, bandits approach are inherently saying, “Maybe we’re not really trying to figure out why, we really are trying to just make sure we’ve got things kind of monitoring on an ongoing basis and shifting the traffic allocation to what seems to be performing better.”
20:12 BF: Yeah, the bandits it’s really the whole design in the multi-armed bandit is about making experiments cost less. So the multi-armed bandit is the canonical example is if I go to a slot machine, how do I figure out which slot machine is gonna pay out more? The AB testing framework would say, “Let’s go spin each one a 1000 times and figure out what my EV on each slot machine is if I spin each a 1000 times.” The multi-armed bandit is just an algorithm such that like, it’s really expensive to spin losing slot machines a 1000 times, and if you can learn that faster, you wanna get their faster. So, multi-armed bandits are actually super efficient in terms of learning as fast as possible, and the whole idea is like, let’s just minimize the amount of time that we’re running this sub-optimal choice.
20:51 TW: Got it. So I wanna go back, we dropped counterfactual. So you just dropped EV, which was expected value.
20:56 BF: Yep.
20:56 TW: It took me… So counterfactual and I think the potential outcomes kind of the… Are cousins. Can we define counterfactual? I had to sort of see that in a diagram before I really understood what the counterfactual… It’s kind of like one of those where it’s a fancy word and it’s a little, I think, twisted as to what it’s actually saying. Can you define counterfactual?
21:22 BF: I’ll give it whirl. So, a counterfactual I think the best way to define it is the alternative of a positive statement. So if I’m going to make a statement of truth, that like… Let’s put in the context of web and analytics, if changing the button to green is good. So the counterfactual you would have to state it as what is the alternative to that? So what is the mutually exclusive and comprehensive like other set of that? So it would be… Well, what would the world look like had that button not turned to green. So it’s all about thinking about what is the alternative outcome, but that alternative outcome, essentially, you have the set of all possible, and if you set it up correctly it essentially divides the world into two paths, one when you make a decision, and then the other one when you make the opposite of that decision.
22:09 TW: And a lot of that, I feel like the reading I’ve done on it has come kind of from social sciences. When we think about it AB testing, you’re like, “Well we can kind of, at least sort of see both of them but in many, many cases, we can’t, where we have to make a decision.” We’ve got historical observed data that we didn’t necessarily have control over. So put it purely in just like I’m analyzing the website, find opportunities for me to improve my website, setting AB testing aside. We wind up saying, “Well wow, it seems like more people are falling off at this step in my checkout process than maybe I would have expected,” but I don’t have the kind of definitionally the counterfactual cannot… Is not being observed. You have to… You can’t have two split universes outside of your partial the way that you have it in AB testing.
23:00 BF: Yeah, I think that’s what makes the whole causal inference thing challenging is ’cause you don’t necessarily have the counterfactual. So, there are situations in which you run an AB test and you do that to generate the counterfactual, like that alternative. There’s other situations where you have like, let’s say, you roll out a product feature and there just… It doesn’t make sense for you to roll it out to like half the population or parts of the population. In that situation, you don’t actually have data that represents the counterfactual, so then you have to get into more sophisticated techniques that allow you to generate data or to do some sort of analysis where you can logically establish that this pattern happened or this pattern would have happened had we not done this. And that’s like kind of the art of, and there’s some science behind it, but that’s like kind of the art of causal inference is like figuring out how do you solve those nuanced situations in which you can’t run just a clean AB test.
23:50 TW: But the science is literally, and a lot of times, we’re working with time series data, but like for instance, you say let me take the data up till I ran the change. Let me use that data to build a forecast of what I would expect, and then, this is very crude, but then compare my actual results to what that forecast was. And then there are other ways to say, “Well, let me look at some variables that I don’t think should have changed and I can compare that.” Again, I think diagrams worked a lot better. You had some nice diagrams in your presentation.
24:21 BF: Thanks.
24:21 TW: So that’s kind of a pre post. Right? We get asked to do a pre post analysis and it sounds like, yeah, causal inference plays a big role in that there are a lot of techniques.
24:32 BF: Yeah, yeah, totally. But the ability to generate the counterfactual comes down to how confident are you in that forecast. So if you believe that forecast is a 100% accurate of what would have happened had that site-wide change not happened then, yeah, comparing post versus the forecasted of the pre, that’s a totally valid thing to do. So, in my talk, I talked about correlation plus reasoning is causal inference, that reasoning, the logical argument of, “Can you actually establish it like that forecast is what I think would have happened had I not made that site-wide change,” your ability to reason that is how strong of a inference you can make in terms of causality.
25:12 MH: No. We’re never going to. But I wanna ask, I’m gonna give you an example of something, and then what I want to see is if you can take that and say, “Okay, so to take that one step further, what you would like to do is maybe this?” So as an analyst, at one point in time, we’re responsible for optimizing and making updates to things on the website to make things work better. It was a retail site, so we’re selling products, so we built out a model that basically said, here’s how people flow through the website in aggregate. So they come in, they look at products, they put stuff in their shopping cart, they check out. And then we would use that to drive our analysis to basically say, “Hey, if we incented or got people to stick their stuff in the shopping cart more,” so like if we made the product page a better product page in some way, then we would then increase the overall product revenue and those kinds of things.
26:05 MH: And so, we would do our analysis to then prioritize AB test against that and so it sounds like kind of what you’re talking about, ’cause we’re basically making the assumption or making some kind of inference of, okay, we’ve got this activity, let’s say it’s how many pictures we show of the product on the product page or something like that, so that’s are no hypothesis, is like by changing the number of pictures, we will get more people to put that product in their shopping cart. And then we’d run an AB test on it and then we would see if it worked or not. So then, okay, apply causal inference and say, “Okay, now to make that one step more sophisticated or better, what would you do from there?”
26:45 BF: The way you explain it’s actually fairly clean.
26:48 MH: Sure.
26:48 BF: You have your outcome variable you need to find which is conversation rate or add to carts lets, say add to carts and then you said your hypothesis is like by changing the number of pictures we believe that people will increase their add to carts. And then by putting it in AB framework you have… You’re creating two data sets, you’re creating one data set in which the number of pictures has changed and then you have your other data set which is where the number of pictures stay the same, that’s your counterfactual. So I think the only thing I would change is just being very explicit about what your counterfactual statement is, and that’s a very academic thing to say, it doesn’t change the power of the analysis. It just makes it much more explicit, what you’re trying to do. We believe in this world that changing the number of pictures is this, so we wanna know in an alternate world in which we don’t change the pictures, what would be those outcomes be? But I think…
27:37 BF: So that’s the thing, as long as the test is randomized and you know there’s… And this is I guess stepping back a little bit, but as long as you know that the groups that you split on are identical in all ways outside of the treatment, then there’s really not, and that’s kind of the beauty of it is, it’s not hard if you can do certain things, it only gets hard when you can’t randomize well or you can’t hold people out or time series kinda comes in and forces you to do the pre post, that’s the only time when it gets complicated, the rest of it is just like, as long as you can establish that the data set your control is what the counterfactual would be, like you’re good, it’s super nice, you can do t-tests, you don’t even have to run regressions, you just run a t-test and it’s amazing.
28:18 MH: So that’s helpful ’cause I think walking into this episode, I’m kind of like, holy shit everything I’ve ever thought of AB testing has got some massive flaw to it, I’m just… I’m afraid to even touch that stuff anymore. And it’s Tim fault mostly ’cause he’s always talking in values and pillars and all this stuff. And it’s like I get that there’s uncertainty but we still gotta get stuff done so it’s very helpful for you to drive it that way so thank you very much is all I’m saying.
28:46 MK: But okay. Sorry, sorry Bradley, what in the circumstance where you can’t AB test? What are the some of the other methods that you might turn to?
28:52 MH: Yeah, great question.
28:53 TW: Wait can I just… Literally you said just rolling out, if you rollout a product feature where you… It doesn’t makes sense to AB test, I would think if I ran a campaign, we’re gonna run this campaign, we’ve never run this campaign before, it’s gonna be in multiple channels. Sorry, the side-track is off of the question but it seems like every place where we’re like, “Oh crap, I wish I could AB test that,” falls into this.
29:21 BF: Yes, right, that’s a mutually exclusive set, you can either AB test it or you have to do something else so anything that you can’t AB test falls into, how do we generate that out? So one way to do it is do geo testing which is just like a variation of AB testing.
29:36 TW: But Geo testing… But doesn’t that start to bring into the… You have to find two Geos that you think are substantially similar and so that’s your starting, back to kinda what you were talking about that your…
29:49 MH: Good old test markets.
29:51 BF: Yeah, no, no, that’s exactly right. So it becomes increasingly more challenging as you get away from AB because establishing the quality of that counterfactual comes down to in Geo-testing, how matched are your geos? And how well do those geos actually generalize to the general pop if you were to do that.
30:07 MK: And especially in Australia, we did some Geo testing and it’s actually really tough because we have such a small population so trying to compare two cities is actually fairly difficult or two areas within a city.
30:20 TW: You just go door-to-door and ask people though, right? And you just kind of cover everybody.
30:23 BF: Yeah, so this is where if you turn to the academic research, there’s some cool methods of how do you hack around this and so there’s a couple of political scientists and I’m gonna… Abaddi, I think is his last name, wrote a paper and basically the idea is if you don’t have a good Geo, you can essentially mash up a bunch of them together to make them look like what you want to do and if you believe that, that especially Frankenstein Geos, matches or serves as a reasonable counterfactual and you can do all the historic comparisons and things like that, then cool, you just make a forecast on what that mishmash Geo would look like going forward and that becomes a pretty strong argument because it’s, instead of having to make the logical argument of why Pittsburgh is like Cleveland or whatever it may be, you could just say here’s a super analytical solution of these two things have been optimized to be as identical as possible on all of the things we care about so it’s reasonable that it serve as a counterfactual.
31:18 TW: But at that point, in practical terms, now you’re relying on whoever, whatever the channels are that you’re executing through to actually, one, be on board and then able to actually execute based on whatever you set up, right? It’s like Google could do a lot of things but hey, you’re running through your Google rep who may or may not actually… There are enough parties involved that are…
31:42 BF: Yeah.
31:43 MK: And if you’re doing TV, then you just completely screwed.
31:46 BF: Yeah, there’s other techniques for TV and… I used to work at Wayfair and you look on their tech blog they’re like, I believe they actually, they probably do it as good as anyone but it’s based on, they use a signal processing method to generate their counterfactual and they basically try and forecast what would visitors at the site have looked like had we not shown a TV Ad in this 30-second spot and then they try and quantify how big of a spike in visits to the site, was it based on that counterfactual. And it’s a really solid method. I keep trying to convince them to write it up on a paper but… Alas! No one wants to do that.
32:19 MH: You got some secret sauce, you wanna hold on to it.
32:22 TW: But in a lot of these, yeah I guess in the case of Wayfair, they had scale, right? So, counterfactual. So, the other thing… One of our data Scientists talks about potential outcomes framework, is that just, when you say potential outcomes, the potential outcome is your counterfactual. It’s either your…
32:40 BF: Yeah, yeah, yeah. It’s just if you wanna say, the mathematical specification, it’s just saying, if my treatment is Y, then I believe it’s this outcome. If my treatment is X, then I believe it’s this outcome. And depending on the number of treatments or the nature of your treatment, it could be the volume of your treatment. You basically use that to specify what all of the different outcomes are, and then so there’s all of your potential outcomes are, aptly named. [chuckle] And then, and then the challenge is like, how do you start collecting data on all those potential outcomes?
33:08 TW: Okay. I think I’d probably sidetrack from Moe’s question a little bit. We said, alternatives to AB testing, we said geo being kind of a type of AB testing, but then what are some of the… We’ll hear regression as being like, “Gee, everybody understands that basically has some intuition around that we all learned how the formula of a line when we were young”, so does regression play in? Does it have its place? Is it just you can trip yourself up?
33:36 BF: For what it’s worth, I think regression’s super under used. So, the AB test, the what do they sell it as, like you basically you run your AB test and then you can just run a T-test or a Z-test or whatever it may be, and get at it. And that’s your actual… And then the mean difference is your measure of true effect size. But the reality is, this is where the error term comes in. There’s a lot of other things that are correlated with the difference in errors, that gets kind of baked in there and what we wanna establish is, how much of the effect is actually due to the decision that we made? And so by running regression, you can essentially control for all of that observable correlation, right? So any other thing that may shift the outcome, you wanna throw those into regressions just to hold them constant. You don’t really care about improving the predictability of the model, you’re just trying to get rid of correlation. And that’s, that’s where I think, regression has a ton of value.
34:25 TW: What do you mean you’re trying to get rid of, you’re trying to… You don’t care about the predictive-ness of the model, you’re just trying to… You’re just trying to continue to find factors or variables that you’re adding so you’ve got as much observed as possible?
34:40 BF: So I’m gonna use a non-marketing example, but this is a really common example in Econ to teach us. So if I want to predict how much money someone’s gonna make over the course of their life, I really wanna know how much, one causal relationship that I could establish is like, how much more money am I gonna make if I get in an extra year of education, or an extra two years of education?
35:00 MK: Alright let’s say an MBA or a Masters of Data Science. Just hypothetically?
35:02 BF: Correct, right. So how much more, right? [chuckle] How much more am I gonna make, right? ‘Cause the reason why you wanna do that as a consumer is, you wanna do it because it’s a positive ROI. You actually get a return on your investment. So you could collect data on a bunch of people who got different levels of education, and a bunch of people who’ve got different levels of income. But the problem is, you’re gonna get these weird relationships because you have things like professional athletes who get not that much education but have huge salaries, right? And so, if you can control for the nature of the job or you can control for just natural ability, right? So let’s say everybody gets in the same…
35:37 BF: Let’s say everybody gets an analytics job right, and there’s variation like the analytics salary. One thing you actually wanna control for is geography, ’cause you’re gonna make more money in San Francisco or Boston then you are in, I don’t know. Nashville, Tennessee, just for cost of living adjustments. So you wanna make sure you control for all that stuff, so you’re actually getting at, How much does increasing education actually increase value. And so, regression allows you to hold those things constant and control for all those. And so you actually get the true effect size and not get some weird bias in your answer, to give you some misleading data.
36:09 MK: And so then how would you use that in your next step of analysis?
36:13 BF: So whenever I run an AB test, how do I throw regression in there? Or how do I leverage regression?
36:20 MK: Yeah, yeah how do they complement each other?
36:22 BF: So I think it’s one of those where, as an aside, I think, you could run an AB test, and you could run just your general T-test and be done with it, but I think what you wanna do is you wanna take what are all of the other things that correlate with it, like conversions, right? So if, let’s say we run a test over the course of two weeks and we change something on-site, and we wanna look at how converter rate controls on-site, one thing I think you would wanna do is add days of the week, ’cause I’m assuming there’s some natural variation to conversions that are a function of it just being like the weekend or not the weekend.
36:51 BF: They could impact the treatment. And so, that’s where I would go through and starting to do that analysis is saying, what are all these other things that could affect the number of conversion we have, and start controlling for those. So the only thing that we have left is, how much did the treatment actually impact conversion? ‘Cause there also could be some seasonality trend, right? Like if I run, growth over time or if I run some tests over time, and there’s a bunch of other things that just changed general site performance outside of my specific test, confounding test is a… Not a fun thing. Then you wanna make sure you can kind of control for those things.
37:23 TW: Okay, so regression is starting to make a little more sense. You had also when you spoke, talked about difference-in-differences, and instrumental variables. Can you take a crack at explaining those? [chuckle]
37:36 BF: Yeah, so also a difference-in-difference ’cause that’s the easy one. Looking at pre-post-assessments. That’s effectively a difference-in-difference, in that case, you’re saying, “my counterfactual is the forecast and then my truth is like what happened”. Difference-in-difference is just the regression form to analyze that. So it’s not super complex or anything like that, but it’s the idea of, there’s two things and you wanna control for whatever the difference was at the beginning of time, and then whatever the difference was at the end of time, and if you subtract those two differences, that’s the true lift. And that just helps you from overstating or understating any kind of actual lift that occurred because of effectively pre-test differences.
38:15 MK: I’ve actually used that a 1,000 times, I didn’t know it had a name.
38:19 BF: Yeah, and there’s super handy ways to get at in regression. So if you just throw in two terms of pre-post is a dummy variable and then a, treatment or non-treatment is it as a dummy variable, and in the interaction term, gives you the true unbiased test and you don’t have to do anything fancy it just works, it’s pretty amazing.
38:33 TW: To put it… If you had something that was trending steeply upward and you just said, “Oh, we’re just gonna look at the after and the before,” even though it was already trending upward, you don’t do anything to take out account that no what was the difference in the differences, not just what was the absolute difference. I don’t know if that…
38:54 BF: No, no, you do. That’s the whole thing. So if it’s trending in an upward direction, you assume that it would continue to trend in the same upward slope if you hadn’t introduced the treatment, and then all you’re trying to measure is how much does that upward trend adjust either up or down, and then that’s the actual effect of the treatment.
39:11 MH: Got it. All right. Excellent. All right. We do have to start to wrap up, but this is very fascinating and maybe most fascinating for me because I started thinking, “I didn’t know anything about this but I’ve been using it all along.” Awesome.
39:27 MK: As opposed to me who’s like, “I need to go back and re-listen to this entire episode.”
39:31 MH: No, Bradley, I think that’s a testament to you making it really come through clearly, and so thank you on behalf of all of our listeners and me needing it in easier terms, that’s really helpful.
39:44 MH: One thing we like to do on the show is go around the horn and do a last call. Bradley you’re our guest. Do you have a last call you wanna share?
39:52 BF: Yeah. I wanna call out a book that I actually just finish reading called Prediction Machines. If you haven’t read it, highly, highly recommend it. The subtitle’s The Simple Economics of Artificial Intelligence. And basically it’s this book written by a couple academics at University of Toronto who work with the entrepreneurship and they go through and talk about what’s the value of AI and what’s the value of machine learning and predictions. And they don’t actually get into causal inference, so it’s a little bit off-brand. But I just found it such a fascinating book and I think anybody who’s managing data scientists or analytics who are wanting to get into machine learning or any of this stuff, you should definitely read this book ’cause it puts into context of what’s the actual value that these things are driving for your company. I think that lens is needed so people don’t just get drunk on the possibility…
40:35 BF: And try to stay grounded in reality of what are these things actually doing, and then what are the limitations?
40:41 MH: Very nice. Very, very cool. That sounds like when I should pick up. Hey, Moe, what about you? Last call?
40:49 MK: Yeah. I’m actually reading, I’ve just finished her first book and just started the second book, which actually came to me via my sister. It’s an author called Emily Oster and she’s an economist. But, she writes about he most interesting topic ever, which is all to do with women’s health through pregnancy and early childhood. And she uses her economist brain to really challenge all of the myths that are out there. So she basically, when she started having a family, kept going to the doctor and they’d be like, “Oh, there’s a really high probability of a risk of this if you do that.” And she was like, “Okay. Well, what’s high?” And no one could tell her so she started doing all of her own research, has literally gone through every medical study that you can possibly find. And because she’s an economist, has such a good way of framing the problem. And she’s basically like, “Here’s all the data on this topic. Now, you make your own decision.” And I just think it’s really cool. So if you’re a parent, I highly recommend having read.
41:44 TW: So she’s like, “You say this could happen, but what’s the counterfactual?” Does she use language like that in the book?
41:51 MK: Oh and did I… Oh yeah, so the two books are expecting Expecting Better and Cribsheet. And, yeah, she basically ends up circulating all of her research to her friends, and then eventually someone was like, “You should put this in a book.” And, yeah, she’s pretty awesome.
42:03 MH: Nice.
42:04 BF: Very nice.
42:05 MH: Okay. Tim, what about you?
42:07 TW: Well, I’m gonna do a book as well, but it’s gonna be very tangentially related ’cause I literally just finished it on the airplane I was on a few hours ago. You guys know this guy Simo Ahava?
42:20 MK: Rings a bell.
42:23 MH: Yeah.
42:24 TW: Yeah. So he’s not written a book, but this was courtesy of a Robert Miller, one of Michael’s and my co-workers, his sister is an author, a fiction author. So I just read Things that Fall from the Sky by Selja Ahava. Selja, I’m probably mispronouncing her name. I think it’s the only book that’s translated into English thus far. It’s in…
42:46 MH: That’s pretty cool.
42:47 TW: I mean, it’s pure novel told from multiple points of views. Kind of a little trippy. But there you go, Things that Fall from the Sky.
42:54 MK: Obviously some genius in the family.
42:56 MH: Very nice. So yeah, Simo, maybe write a book there. [chuckle]
43:02 TW: What about you, Michael?
43:03 MH: Well, I’m so glad you asked. I’m also reading a book right now, which I’m actually kind of enjoying. It’s called Finish by Jon Acuff. Has nothing to do with analytics and everything to do with finishing things, which is something that I am not fond of doing.
43:23 MH: No, I think people’s personalities go in a lot of different directions and my personality definitely loves to dwell on the potential of things.
43:35 MH: And so a lot of times I find myself sort of dragging out any decisions or finalization for a while. And so I’m enjoying this book.
43:45 MH: All right. You’ve been listening and you’ve probably been thinking, “Oh, I’ve got something I wanna say or talk about,” as it pertains to causal inference and or casual interference, but most likely the first. We would love to hear from you and the best way to do that is to reach out to us, that’s through the Measure Slack or our LinkedIn group or on Twitter. But we would love to hear from you comment on the show, whatever you wanna do. Also, if you’re a regular listener of the show on whatever platform, if you wanted to rate the show and give it a review, that is something that apparently helps some algorithm somewhere do something causal to our rating. And things like that.
44:25 TW: We’ve actually gotten a few reasonably recently that were really nice. I was like, “Oh, well that’s nice.”
44:30 MH: Yeah. We always get really nice reviews and we do appreciate all of them. So feel free to do that. Also, obviously wanna give a big shoutout to the tremendous team on the show, ’cause we have our engineer Darren Young and our producer Josh Crowhurst who are a huge help getting this show out there to you, and so we want everyone to know who all is making this happen. So once again, Bradley, thank you again for coming on the show, making some very keen statistical concepts much more accessible.
45:07 BF: Hopefully, I wasn’t too confusing.
45:08 MK: It was fantastic.
45:10 BF: Thanks for having me.
45:11 MH: Yeah. Very, very good. And I think I speak for my two co-hosts, Tim and Moe, when I tell all of you out there, now that you’ve got a little bit better understanding of causal inference get out there and keep analyzing.
45:30 Announcer: Thanks for listening. And don’t forget to join the conversation on Facebook, Twitter, or Measure Slack Group. We welcome your comments and questions. Visit us on the web at analyticshour.io, facebook.com/analyticshour, or @analyticshour on Twitter.
45:50 Charles Barkley: So smart guys want to fit in so they’d made up a term called analytic. Analytics don’t work.
45:56 Tom Hammerschmidt: Analytics. Oh, my god. What the fuck does that even mean?
46:04 TW: Moe, if I’m gonna watch you moving your bike forward and back, you are… You know, you’re dumb.
46:13 MK: Dude, like… Okay, this is my life right now. You know? I’m doing what I can.
46:18 MH: Oh, editing is fun.
46:20 TW: [laughter] Your facial expressions aren’t going to come through the audio.
46:29 MK: Oh, my god. Can we just scratch that whole thing?
46:35 MH: For me, and I’m more of your average analyst, as opposed to like Tim and Moe, but when you think about…
46:42 TW: So, you’re saying we’re below average analysts? Is that…
46:43 MK: Well…
46:43 TW: Did you just insult us?
46:44 MH: Above average. Come on, you guys. Come on.
46:52 MK: Every time you explain something, I understand it. And then Tim says something, and then I’m confused again.
46:56 TW: Blah, blah, blah, blah, blah…
46:58 MK: So, you know…
47:00 MH: Is that what they call confounding?
47:01 MH: And, you know, working with Tim makes it feel like you don’t know anything. So this has been really good. Yeah, Tim. Yeah, you’re up there with fancy words. Really?
47:10 TW: Really.
47:13 MH: Even prepping for the show, there were all these formulas that don’t have any numbers in them. That’s really difficult.
47:21 MH: So, anyway…
47:23 MK: Yeah, but then, he’ll do set us so much in the intro that then it was like burned in my memory, and I was like, “Don’t say the wrong thing. Don’t say the wrong thing. Don’t say the wrong thing.”
47:34 MH: Trust me, I was doing the exact same thing. It was like, “Say it right. Don’t say it wrong.” Oh… And then, “Say it wrong. Don’t say it right.”
47:44 TW: When will we start seeing results, Michael? That’s what I wanna know.
47:47 MH: Tim, this has actually all been ongoing work for many years so you’ve already seen some pretty strange results.
47:57 TW: That’s progress? Okay.
48:00 MH: Oh, fricking frick.
48:01 MK: So, wait, you haven’t finished the book? That’s what you’re telling me?
48:07 MH: No, I have not finished the book. It’s very unlikely that I will, too, y’all. I probably won’t ever finish it.
48:15 TW: The last, the last four chapters are just blank images anyway. The author is very self aware.
48:20 MH: Dark face. That’s right. Just… Yeah.
48:26 MK: You had to tell me to think about my sentence, which I did. [chuckle]
48:34 MH: Rack fraggin’ casual interference.
This site uses Akismet to reduce spam. Learn how your comment data is processed.