Did you know that, upon closer inspection, many a statistical test will reveal that “it’s just a linear model” (#IJALM)? That wound up being a key point that our go-to statistician, Chelsea Parlett-Pelleriti, made early and often on this episode, which is the next installment in our informally recurring series of shows digging into specific statistical methods. The method for this episode? ANOVA! As a jumping off point to think about how data works—developing intuition about mean and variance (and covariates) while dipping our toes into F-statistics, family-wise error rates (FWER), and even a little Tukey HSD—ANOVA’s not too shabby!
Photo by Gary Borenstein on Unsplash
00:00:05.73 [Announcer]: Welcome to the Analytics Power Hour. Analytics topics covered conversationally and sometimes with explicit language.
00:00:14.19 [Michael Helbling]: Hey everyone, welcome. It’s the Analytics Power Hour. This is episode 277. Given the chance, we’ll frequently take any excuse to do just whatever we feel like doing. But ostensibly, this episode is part of an unofficial series from our listener survey. It’s another deep dive into a statistical concept. That’s right. We’ve got two keys in ANOVA, and we’re going to drive straight into a post-hoc analysis of all of our life choices that led to this moment. All right, we’re doing Analysis of Variance, or ANOVA. I really don’t know much about it. It was something that the SAS developers and the other department would talk about sometimes. But I’ve, this one’s been out of my league for a long time. So I’m pretty excited to learn a little more. Julie Hoyer, are you stoked about another stats-focused episode?
00:01:08.63 [Julie Hoyer]: Of course I am. I can’t wait to get into it.
00:01:10.81 [Michael Helbling]: Awesome. And Tim Wilson, I know you’re probably a rarer end to go.
00:01:16.03 [Tim Wilson]: I’m looking forward to being more confused coming out of this than I am going in.
00:01:21.14 [Michael Helbling]: Well, I don’t know if that’s the result we’re looking for, but we’ll see if there’s any significant difference in your knowledge after the fact. All right, I’m Michael Hobling. And for our guest, well, we had to do it. We had to bring back our favorite statistician from just 10 episodes ago. And it’s her third time on the show, Chelsea Parlett Pellerini. She’s a consulting statistician at recast. And she uses her PhD in computational data science. She teaches statistics and math at Chapman University. And once again, she is our guest. Welcome back again, Chelsea.
00:01:53.01 [Chelsea Parlett-Pelleriti]: Thank you so much. I’ll need to do something notable before the next time I come on, so you have a good bio for me that’s different than this one.
00:02:00.78 [Michael Helbling]: Well, you know, we’ll do a, we need to go deeper into like what your interests are. So like, you know, Corgis, Stardew Valley, those kinds of things. And we can start to put a bio together around that stuff.
00:02:13.24 [Chelsea Parlett-Pelleriti]: That’ll come up today, actually.
00:02:15.32 [Michael Helbling]: Perfect. It’s not coming from me, but I think a good way to start this process is maybe start at the very beginning of analysis of variants, and even maybe start with how you feel about it, because I know it’s not your favorite thing.
00:02:34.02 [Tim Wilson]: It wasn’t a topic that Chelsea pitched us.
00:02:37.14 [Michael Helbling]: Yeah, yeah. We were like, hey, please, can you come talk about this?
00:02:41.98 [Chelsea Parlett-Pelleriti]: Yes. Well, I’d actually like to start with a poem if you don’t mind.
00:02:46.76 [Michael Helbling]: I love it. Yeah. That’s right up my… Absolutely. Exactly the kind of start I think we need here.
00:02:52.73 [Chelsea Parlett-Pelleriti]: Perfect. Okay. So this kind of encapsulates how I feel and I want to clarify. I don’t have a problem with the ANOVA itself. It is more the way we communicate about it that sort of distracts people from things that are good about the ANOVA. So I’m not wholly against, but if I may, from the archives of my Twitter account, May of 2020. There once was a model ANOVA who, along with their cousin, ANCOVA, made a great big confession, were the same as regression, but we’ve established a separate persona.
00:03:31.55 [Michael Helbling]: Wow. That’s a memory. Yeah. Yeah, GPT could never.
00:03:36.60 [Chelsea Parlett-Pelleriti]: Yeah. They could never come up with that. So that’s the basis of how I feel about ANOVA’s, which is that they’re linear models. And when we teach them as separate concepts, people sort of lose that connection. And so that’s my biggest gripe with ANOVA. It’s not the actual math or anything behind it. It’s that often, especially in my, you know, original field of psychology, people teach these ANOVA models, ANCOVA’s, MANOVA’s, you know, all the different letters that you can cram in there. And they teach them as something that’s distinct from a regression model. And when you do that, people really lose two things. One is a connection to any of the really great linear regression knowledge and content that they have. And two is the generalizability of the concepts that you learn in an ANOVA. One of the things that I would run into a lot, especially in my psychology days, is people thought an ANOVA is one thing and ANCOVA is another thing. A Minova is a third thing, a repeated measure. A Minova is a fourth thing. And they didn’t see how they were related because they weren’t taught in the linear model context. So they didn’t see like, oh, an ANCOVA is just like, you basically add a covariate to your regression model. And so that’s my main issue is that when people talk about using ANOVA’s, they’re typically talking about it in this framework of like, this is a separate thing from a regression model when really what you’re doing when you fit in ANOVA, when you use ANOVA is you’re fitting a linear model and then you’re looking at the outputs of it slightly differently than you might if you ran a traditional LM in R where you’re running a linear model.
00:05:33.01 [Tim Wilson]: Putting aside all of the linkages there, defining where an ANOVA is or could or should be, what’s it doing? What’s the purpose of that class of methodologies?
00:05:46.77 [Chelsea Parlett-Pelleriti]: Yeah, it’s in the name. So an analysis of variants or an ANOVA is… Well, that just turned off the business users right there.
00:05:53.62 [Tim Wilson]: You’re like, come on, what for do you need to know?
00:05:56.25 [Chelsea Parlett-Pelleriti]: Yeah, of course, obviously. But it’s analyzing the variants and the data. So if you think about, let’s say, a data set that we could have, say, you’re trying to see there’s three different ad campaigns that you trialed, and you’re trying to figure out Are they different? Are they all giving you the same click rate or are they not? Are they all giving you the same average order or are they not? When you look at an ANOVA, essentially what it does is it says, we’ll look. Let’s take the average value of the order. So like if you have all of the order values during your experiment, you have these three different marketing campaigns say that you sent out, one of the things that you might wanna do is say, okay, there’s a lot of variance here, right? Some of my orders are for $70, some of them are for $20, some of them are for $400. What can explain that difference? in the orders that we see. So we’re observing that orders are not all the same. Why are they different? And we basically take all of the variation that we see. Not everyone has the same order value. You can picture the mean order value and everyone’s order value is hovering around that mean value. Some are really high, some are really low. And so what we’re doing is we’re partitioning that variance into sources that we care about. And the simplest case, like a one-way ANOVA, you’ll often hear people talk about. The simplest case is that we have two categories, a variation that we really care about. One is variation due to the group, or in this case, the marketing campaign, and then variation due to what we would call, quote unquote, randomness, right? So it’s variation within a group. And so at its simplest level, the ANOVA is basically taking that variance and it’s partitioning it into those groups. So what variants can we attribute to marketing campaign? What variants can we not contribute to marketing campaign? And then it compares those things. And essentially what you’re doing when you’re running what we typically think of as an ANOVA is you’re seeing if there’s statistical significance or you could use a Bayesian framework, but usually you’re using a Frequentist framework. You’re seeing like, is this statistically significant? Is the amount of variance that this explains something that is notable or unexpected under the null. That’s what we’re doing. We’re just partitioning the variants, and variants like the ANCOVA are just adding another category. If we have a covariate, say age or… Wait, hold on.
00:08:30.70 [Tim Wilson]: Can we stop before we go one level? So, just to say took the dumb dumb analyst or the marketer who’s just looking and says, I’m just looking at average order value and I break it down by campaign and one average order value is $75. One is $80 and one is $90. Just by looking at the, which is a mean, the average order value, that there’s a tendency to say, well, these are different and It’s easy to say, well, that’s the difference between these. But everything you just described was saying, well, if order values are all over the place and it just happened to be that you dropped in and partitioned them, sliced them by campaign, yeah, you just happened to get bigger ones in one and not in the other. So it’s giving you a way to say, given these observe different means, how am I confident that that means the way that I partition them actually is contributing to that. It’s not just that I’m just kind of arbitrarily seeing that it’s a noisy, wide spread. Am I playing that back accurately?
00:09:48.56 [Chelsea Parlett-Pelleriti]: Exactly. So you can imagine a scenario where, let’s say you have this magical campaign where everyone who gets variant A, their order is right around $80. Sometimes it’s $81. Sometimes it’s $79, but it’s right around $80. And variant B, it’s right around $60. Sometimes it’s $61. Sometimes it’s $59, but always around the same. In that case, it would be super clear, even without a statistical test, that you could visually plot that data out and you would see that the amount that orders vary within your campaign variance is so small compared to the amount that the two differ from each other. I think I said $20 difference between them. and that’s what you’re quantifying mathematically for basically cases where you can’t immediately see on a graph like in the example I just described. So technically, in an ANOVA, the null hypothesis that you’re testing is that whatever groups you have, so it’s usually two or more because if you had only two, you could use a t-test. But basically, you’re saying, All of the means of these groups, however many there are, are equal. That’s the null hypothesis. And the alternative hypothesis is that at least two of these means are different. And so this gets into something that maybe is too deep. You can stop me again and we’ll go back. But this is an ANOVA is like the F test in an ANOVA that would say, OK, here I have three campaigns or I have 10 campaigns. Is there a statistically significant amount of variance explained by campaign? You’re essentially doing something called an omnibus test where you’re testing. Is there a difference somewhere in this mess? but it won’t tell you by itself the F test, will not tell you by itself where is that difference. The omnibus test is looking overall at the variance explained when you know what campaign someone has been exposed to, whereas typically we’ll often have questions that are a little more targeted than that. We want to know, okay. This is our business as usual campaign. Here’s an experimental campaign and here’s an amped up version of that experimental campaign. In that case, what we really probably care about is business as usual different from the other two and is our amped up experimental campaign better than the like regular experimental one. And so in ANOVA by itself won’t tell you that you’d have to follow it up with post hoc tests, which you mentioned in the intro. And yeah, so that’s essentially what you’re doing at kind of the simplest level of ANOVA.
00:12:35.03 [Julie Hoyer]: So in that example, actually, would you be able to just run an ANOVA on the data for the business as usual campaign and the experimental one and just do that pairing? So you could choose the pairs of those to look at. So then you could have the clear answers you’re talking about. But traditionally, an ANOVA, somebody might be like, no, we’re going to throw all three in. And to your point, the result that would come out of that ANOVA would just say if there is actual variance between those three categories. It won’t tell you between which two. But with three, it’s easy to be like, I’ll just split it out. But to your point, like a lot of times if we have a ton of categories, it becomes very cumbersome and not realistic.
00:13:23.46 [Michael Helbling]: All right, let’s talk data. We all love insights, but let’s face it, setting up integrations, that’s not exactly a party. That is why there’s 5Tran, the smartest, easiest way to automate your data pipelines. Think of it like the ultimate, set it and forget it gadget for the discerning data professional. Connect, relax, and let 5Tran handle the heavy lifting. Your data lands safely and swiftly into your warehouse ready for action and analysis. Curious? I think you are. Head over to 5Tran.com slash APH right now to stay updated on the latest news and events and see how 5Tran can make your data dreams come true. That’s F-I-V-E-T-R-A-N dot com slash APH. Trust me. If you’re in the litics, well, thank you.
00:14:16.51 [Chelsea Parlett-Pelleriti]: Totally. Well, not only come… I have so many things to say to that because that was such a good point.
00:14:22.34 [Tim Wilson]: Well, so I think you want to hit the, what if there were just two and then you want to hit the, what if there are a whole bunch?
00:14:27.10 [Chelsea Parlett-Pelleriti]: Well, if there’s just two, it kind of doesn’t matter what you do because you’re essentially by running in a nova, running a t-test between them. Something I loved. I don’t know why this fact was so fun to me back in the day, but when I first learned this, I learned that the F statistic you get under very specific conditions, including like there’s only two groups, is just the T statistic squared that you would have gotten if you had run the same type of T test instead of an ANOVA. So there’s a really like one-to-one relationship there. But you said two things that I thought were really important. One is that it’s cumbersome to run a bunch of these different comparisons, which is true, but in a sense unavoidable if you’re interested in all of pairwise comparisons. But I think the point that you’re implying but not saying out loud is there’s also a problem if you’re using the frequentist framework in multiple testing. Let’s say I have 10 groups and I want to compare every pair of two. I can’t do that in my head, but it’s what 10 choose 2. I don’t know what that number is. Lots of comparisons that are happening. And usually in a frequentist framework, we are choosing at an alpha level. So usually we use 0.05. So 5% as our expected error rate under the null. So it’s like type 1 error rate if there is no effect. This is how often we’ll be misled by the conclusions we make of the test. But if you’re running 30 of those comparisons. Suddenly, your family-wise error rate, which is the error rate of making a mistake in that family of comparisons, is huge. That’s a problem. Another thing that is important is, yeah, we could just go filter the data, only include baseline, business as usual, and the experimental. But one of the things that can be really helpful within ANOVA is that your actually increasing your power, statistical power that is not, you know. I don’t know what other kind of power you’re increasing. You’re actually increasing your power because the estimate of your error is going to be more precise with more groups. Because one of the assumptions of an ANOVA is that you have, I think this is just heteroscedasticity, but basically you’re assuming that the variance is the same. across your different groups. And one of the things that that gives you, if that’s true, is that you get a better estimate of what that error is if you look at all 10 groups that you have than if you truncated your data and only looked at the two groups that you, for instance, in this case, are interested in. And so you’re actually increasing your power of estimation a little bit if assumptions hold, et cetera, et cetera. And so there’s actually a benefit to running the ANOVA itself power-wise. But also, like you’re pointing out, you could partition everything, but you’d have to be more thoughtful about what comparisons you want to run and control your family-wise error rate. Now, I’ll take it one step back to my critique of the ANOVA. I actually think it’s better to be thoughtful about the contrast. So in Innova, we usually call them contrast, right? Like which comparisons do you want to run? And I actually think it’s better to be thoughtful about that, correct for any family-wise error rate inflation that you’re causing, and just look at those. rather than rely on the omnibus test, unless you’re actually just trying to answer the question the omnibus test answers, which is, is there variation somewhere? Are all the means equal? It’s a bit of a weird way to say that. I actually think it’s better to be thoughtful. And one of the things that is good about how people teach the ANOVA is usually you teach, okay, you run the omnibus test, but then you follow it up with post hoc comparisons, right? Different pair-wise comparisons you might want to know about. And one of the things I really love when I was learning this back in the day, is that there are some really thoughtful frameworks. You can define contrast however you want. If you’ve ever worked with a novice in R, you know you can define your own contrast matrix. Whatever contrast you want to run, you just put them in there and it’ll run in. But there’s some established ones that I think are really thoughtful. And some of them have to do with the example we talked about of, okay, here’s a business as usual. So in a sense, like a control group. And then here’s a moderate experimental and an extreme experimental condition. There’s different types of contrast where they kind of predefined for you what you’re interested in. So I’m interested in control versus the average of the experimental. That sort of answer is, is my experimental condition working? and then I might be interested secondarily in the contrast between moderate experimental and extreme experimental, because then that tells me, hey, when I really take this campaign to the nth degree, force someone to click on my ad, essentially, is that actually helping compared to my more moderate, hey, click on my banner? Those contrasts are very thoughtful. It’s very specific to the situation you’re in. My overall critique of statistics as a whole is that sometimes we encourage people to not be thoughtful, and I’m always in favor of something that encourages someone to be thoughtful. It’s not the ANOVA’s fault per se, but it can encourage people to just look at the omnibus F statistic, F test, when that’s not really what their question is. And because they haven’t thought about it, it’s just this like, I learned in ANOVA five years ago, I’m gonna throw an ANOVA at it. You really lose a lot, both of statistical power as well as clear answers to your questions.
00:20:39.51 [Julie Hoyer]: And to be a little specific, when you were saying you get more power by having more categories put into your ANOVA, is that because to calculate the F statistic, it’s comparing the variance within the groups to the variance between the groups. And so if you have more groups, you get more inputs for both of those measures. So inherently, you’re getting an F statistic that’s more representative of, or something you could generalize more across the Like the categories, am I getting close? But I’m thinking of sample size. So sample size of these variance measures, you’re getting more of them with the more categories that you give to the ANOVA. So that’s kind of where my brain was going, but I don’t know if that’s actually how that works.
00:21:31.01 [Chelsea Parlett-Pelleriti]: It’s even simpler than that. I think you’re correct. But also, even if I am just interested in category A versus category B, if I’m assuming that all of my groups have the same variance, That’s something I need to estimate with my model. I don’t know what the population variance is there. If I have seven groups and you’re saying bigger sample size to estimate what that variance is, even if it doesn’t help me with the between group thing, it helps me with the within group estimation, which is exactly what can happen here. But I will say that relies on the assumption that they’re all the same and that the pooled variance is a good estimate.
00:22:13.74 [Tim Wilson]: I have two questions and you can choose to ignore the first one if it’s like that is a whole other episode but just some fundamental intuition about what a t-test is and does and maybe it is a companion because as you’re talking about You’ve got a control group and an experimental group, and that doesn’t necessarily have to be run in a controlled experiment. You just got different groups. But when you run a controlled experiment where you do have multiple groups in an experimental fashion, you wouldn’t really use an ANOVA or would you?
00:22:54.25 [Chelsea Parlett-Pelleriti]: In the example you gave, it sounds like there’s only two, like an experimental and a control group. And in that case, the T test should give you roughly equivalent, if not exactly equivalent results to an ANOVA on the same value. So like you, I’ve actually run this before, like when I was teaching or when I back when I was working in psychology where like you can set up data like that and then run it through the ANOVA function in R, run it through a t-test in R. You’ll basically see what I said earlier is that you’re going to get the same p-value with rounding and computational error, and then you’re going to get the t-statistic is, or the f-statistic is t-squared. You’re really answering the same question there. In that case, it does not matter.
00:23:40.95 [Tim Wilson]: What’s the approach of the t-test? I get that you wind up in the same spot, but presumably if you’re teaching a t-test, you talk about it in a completely different way.
00:23:51.02 [Chelsea Parlett-Pelleriti]: It will say it’s a different framework. And the one thing I do love about how we teach ANOVA’s is that in a t-test, what you’re testing is the difference, like the delta between the two means. And you’re comparing that to a distribution under the null and blah, blah, blah. In an ANOVA, you’re really thinking of things not as like, okay, here’s a difference in means that I’m testing, but here’s the variance that’s explained by knowing what category someone’s in compared to variance that’s not explained by that. And then again, you stopped me before, but if you have an Encova, I’m going to squeeze it in now, you can partition into a third category, which is variance due to a covariate like age or location or something like that. So ANOVA is really focused on this partition of variance, how the data points vary about the mean. Can we explain part of that variance with your category and part with randomness? Whereas a t-test is mathematically, like you’re pointing out exactly the same, you’re going to get under certain circumstances, basically the exact same output. But you’re thinking about it in a different way. A t-test is looking at What is the difference in these group means? Say one group mean is 10 and the other one’s five, that difference would be five. How likely are we to get a difference of five if there’s truly no population difference between these groups? Whereas in ANOVA is answering, what is essentially the same question but from a slightly different perspective, which is, okay, if I know what group you’re in, how much of the variance of the scores I’m getting, can I explain? And the benefit here is that Ananova technically generalizes to more groups, whereas a t-test, you would run pairwise t-tests between them.
00:25:39.66 [Tim Wilson]: So for the ANCOVA, can you introduce multiple covariates? Yeah.
00:25:45.77 [Chelsea Parlett-Pelleriti]: You can do whatever you want. It’s just a linear model.
00:25:49.06 [Tim Wilson]: Or regression. So that starts to, OK.
00:25:51.13 [Chelsea Parlett-Pelleriti]: Exactly. Right. Okay. And so to get into that complaint, I think it’s Danielle Whitten who had that series on Twitter months or maybe years ago, where she would just retweet things and say, it is just a linear model. Well, OK, let’s be clear. Technically, what you’re doing when you fit in ANOVA is you’re fitting a linear model, and then you’re using this framework of variance, the ANOVA, the analysis of variance, to analyze the results. But at its core, what you’re analyzing around is just a linear model. And you can add more covariates. You can add tons of different things. And that’s my problem with the framing of how we teach ANOVA’s is it doesn’t make it clear that that’s the case. Whereas when we teach linear regression, we’re a little bit better about Yeah, throw in whatever covariates you want, throw in random effects, do GAMs, so do some smoothing and transform with polynomials, your predictors, and then put them in. And that flexibility is not inherent to the way that people have been communicating about ANOVA’s.
00:27:00.55 [Julie Hoyer]: And I’m having this maybe light bulb moment unless I’m really not following it off the rails here. But Tim, remember when we talked about blocking in tests, RCTs and all that? And we’re like, you just use a linear regression to analyze the result of your test. You can put all these covariates in and blocking something you represent in there. So if you’re doing an ANOVA on a pair, like two simple values of a category, It’s similar to a T test. They’re all linear regressions. And if I was doing this on an AB test, I could run a linear regression. Like I’m having this kind of like moment where Chelsea said like, it all goes back to regression.
00:27:38.69 [Chelsea Parlett-Pelleriti]: It’s regression all the way down. Well, I don’t know if this is too soon to bring this in, but I think what you’re saying reminds me of something that I said when you reached out about this episode, which is, sort of jokingly, but definitely not jokingly, that Cupid, which people use for A-B testing, I’m pretty sure is just an ANCOVA, right? So the whole idea is that you take this like, I believe I’m a little rusty on my Cupid, but you take kind of like pre-test metrics that you have about the customers that you’re testing on, and you use that to reduce the variance in the data because you’re accounting for it, you’re partitioning variance, It’s blocking. I know. And you’re just getting a better estimate, a more precise estimate, because of all the variants that’s out there, you’re accounting for some of it that would have previously been attributed to random variants. You’re now accounting for it with your category. And that’s what an ANCOVA really is trying to teach you, is you can add these additional non-experimental groupings or continuous variables, and it’ll reduce the amount of variance in the error estimation, giving you a more precise, more statistically powered result.
00:28:50.00 [Julie Hoyer]: And so this goes, okay, it is all coming together. I am having like a mind-blowing moment. Because that makes sense, then, where you want to use covariates that you know explain the outcome variable. So if you know age is a factor that would affect the outcome that you’re trying to understand the variance for another category like your campaign, you’re like, well, this age and the next age group, we know they spend really differently. By adding that in as a covariate, like you’re saying, you are narrowing in on then being able to detect variants from your campaign because you’ve isolated and muted the noise of variants from age that you know is a factor that affects it.
00:29:34.54 [Chelsea Parlett-Pelleriti]: Exactly. Whenever you add a covariate in a regression model, you’re essentially saying, what can the other factors tell me after I have accounted for? this variable. And so if age is really important in explaining how people are behaving, then you’re basically saying, okay, like if I know what campaign you got after accounting for all of the noise that happens because of your age. What does it tell me? And you’re going to hopefully get a more accurate and precise measurement by including that. Now, finding things to include that are actually useful can sometimes be a challenge, but if you can find them, they really help the precision of your estimate.
00:30:17.99 [Tim Wilson]: My impression is that Cupid has become in the over the last couple of years, at least in the CRO world, like, oh, it’s the latest kind of shiny bobble. You can reduce your runtime. This is great, I guess, and there might be at least one person who’s probably already been triggered, because I know every time he sees Cupid, he winds up I get text messages. And he’s essentially saying, but it’s not magic. And I think it is probably because it’s what you just got to that you can’t just assume that you’re going to have covariates that you can identify that actually have an effect on the independent variable or the dependent variable. So you can’t just assume that you’re not going to have age in some cases. Even if it is, you’re not going to have that data.
00:31:09.95 [Chelsea Parlett-Pelleriti]: And even more than that, I’ve seen some examples, again, not my area of expertise, but I’ve seen a lot of examples where you have the same cold start problem you have in recommendation models where you may not have that data for a really important sector of the people that you’re experimenting on, especially this probably would come up most with new customers. People are right. Cupid would be so helpful. It means you can run shorter tests. It means you can run smaller tests. It means you can have more precise estimates. But there’s no free lunch, right? You have to have like this quality data that’s going to behave in the way you think it will. And it’s the same idea as in ANOVA or in ANCOVA, right? You’re just, can we account for or partition out some variants that we sort of know is there, is not the category of interest? Can we like section that off? And if you can, then I imagine Cupid is incredibly powerful. If you can’t, maybe less so, but agree with whoever you’re vaguely referring to. It’s not magic and we shouldn’t act like it is.
00:32:21.93 [Tim Wilson]: You might be the same person that’s had, we think, as many appearances on this show as you have. That’s true. If I name him by name, I’ll definitely hear from him. But let me ask another question on those, because I can think of in a simple website experience digital, that there are things like, what was the most recent traffic source? There are things like, what device type are you on that both, if you’re looking to a conversion, seem like they would be legit covariates. When you’re talking about whether you’re doing ANCOVA or whether it’s Cupid, is that inherently a you that’s part of the input to make your actual question of interest more useful as opposed to the flip side? Oh, We looked at the overall test results, and now we’re going to slice them by this other thing and see if significant pops up. Is that a fundamentally different thing where you’re continuing to slice? This is saying, no, I’m identifying this as a covariate so that my question I can get a tighter, better answer to my actual question. I’m trying to use that to remove variability.
00:33:47.76 [Chelsea Parlett-Pelleriti]: Yes. You’re asking slightly a different question. If we’re going to go in the linear model framework, you’re asking a slightly different question when you say, is this relationship consistent between Android, iPhone, computer, whatever type users? That would actually be an interaction effect in your model where you’re saying, does the relationship between my campaign and order value change for web-based, phone-based, whatever. That would be an interaction term, which is something you can just add to a linear model, by the way, because it’s so generalizable, which sometimes we don’t realize with an ANOVA. But that’s a slightly different question than I’m just having campaign in here, and I’m soaking up variance by telling you what platform someone was using. Because in that case, you’re just saying, if I know your platform, can I what additional information do I get from knowing about your campaign, whereas the interaction specifically would model that relationship differently for each platform and would allow you to answer that question of like, Is it different? We’d probably look at the interaction terms there and see if they’re significant, or you could even use a mixed effect model for this type of thing where you say, oh, all of the effects are similar, but they might deviate a little bit. How much do they deviate? You could answer the question that way as well.
00:35:18.63 [Julie Hoyer]: Because with an ANOVA and covariates, you’re not actually interested in the difference between the covariates, like you’re saying. You’re just giving it extra information. But Tim was kind of posing it as more of a question of finding out the differences across that extra covariate dimension of device type, right?
00:35:39.57 [Chelsea Parlett-Pelleriti]: Yeah, so yeah, I mean, it’s just two different questions. I will say, now I’m having to rely on really years old information that I haven’t thought of. But I’m pretty sure for an ANCOVA, one of the first things that you’re supposed to do, I don’t think people do it. And I might make you cut this if I’m incorrect. But I’m pretty sure one of the assumption checks for an ANCOVA is that there’s no significant interaction effect in terms of the covariate having. different relationships, like the interaction effects being significant. And I’m fairly certain that you’re supposed to check that. And so in that case, it would be like, if you thought that was happening, you wouldn’t want to just include the covariate. You would want to include interaction effects because clearly they’re meaningful, but that’s like a slightly different question that you’re answering. So yeah, and I think that’s a really good point about like, you should be thoughtful about like, do you want to know that? Like, if you do, include the interaction effects. It’s not a traditional ANCOVA, but because we’re all brilliant and we know that this, you know, they’re not discrete different models, it’s just different forms of a linear model, we can so easily just add an interaction effect and be like, okay, cool. We want to answer this question. We’ll add those interaction effects. That’s what I love about the linear model framework compared to the way that some people teach ANOVA’s and COVA’s as separate tools that you can use.
00:37:10.44 [Julie Hoyer]: I think that’s the hardest part is A lot of that thoughtfulness and the levers you can pull on these different statistical tools is how people think about them is really lost unless you deeply understand some of the math and the basics behind it. But as we know, a lot of times it’s just a simple you know command in your code to run this thing and if you aren’t really good at checking all the assumptions and really thinking through the exact question you’re answering it’s so easy to use a slightly wrong tool and get a number on the screen. And thank you you’re answering the right question and you’re not and I think that’s what is. scary in two ways. You have to be really knowledgeable to answer that question of like, is this the right number to answer the business question I’m asking? And two, it’s really easy for people to give you a number and they haven’t asked themselves that question or been thoughtful about it. Both of those equally scare me.
00:38:07.37 [Chelsea Parlett-Pelleriti]: And how are you supposed to be an expert both in like, you need the business expertise to know what question is actually important. And like you said, you need the statistical expertise to know if the number you’re getting is targeting that question and what the caveats there are. And that’s one of the things that scares me the most is like, How are you supposed to be an expert in both? I think the answer is you’re not, and you have to collaborate.
00:38:31.86 [Michael Helbling]: Yeah, thanks, Julie. My anxiety had been going down as I was understanding this better, and now I just want to break back up again.
00:38:39.53 [Tim Wilson]: But I think there’s the flip side. Maybe this is… This is part of the reasons that I wanted to talk about ANOVA because I have a very, very clear memory and this was when I was sort of still learning R and I’d kind of gone down the, okay, there’s the benefit of just programmatically being able to do stuff. that’s not clicking around on an interface. I kept being told, well, to learn R, you’re just going to inherently learn statistics. What I don’t really think is true, but at some point, I mean, it was just sort of said that if you’re going to learn R, you’re going to have to learn the statistics. They’ll come hand in hand, and that didn’t really happen. when it comes to illustrating and ANNOVA, and I don’t know if I’ve seen it since. I don’t know what came first. I wound up arriving at a spot where I said, showing somebody who says, I want to have a deeper understanding. I don’t know that this is full on a marketer, but it certainly could be an analyst and just showing distributions and saying, if you’re showing normal distributions with different variances and different means, or the same variance in different means, are showing two different examples of here’s a case where your example of $59 to $61 and $79 to $80, that’s a really tight distribution. It does seem like you can visually help someone at least understand the nature of the variability so that when they go and interact with the statistician or the data scientist, there’s a more productive conversation. And it probably also injects in the same the anxiety I’ve been living with now for seven years. I don’t know anything. I can punch in and run the linear regression, but I am absolutely convinced that there’s something totally wrong with it. I think there’s a case where knowing Developing some of the intuition without getting all the way to I’m picking the right method and the interpretation of that correctly Still has value where I get terrified is people just looking at a Chart and not even having any any under any any intuition about why if they see $90 $80 and $70 they can’t just make a declarative statement about the difference in those groups. Sorry, I don’t know why now is the time for me to square that circle.
00:41:25.23 [Chelsea Parlett-Pelleriti]: Thank you for sharing. I think that’s a valid fear. That fear hasn’t gone away for me yet, that I’m doing something wrong. There’s something that I’m not thinking about that makes this not ideal.
00:41:37.54 [Tim Wilson]: Well, now I’m really, I give up. It’s time I’m going to go get a greener and a rose now.
00:41:41.97 [Chelsea Parlett-Pelleriti]: Maybe you should. There’s probably lots of more qualified people that are like, oh, I’m past that. Chelsea’s just not at that stage yet. But I do think it’s healthy to spear that.
00:41:50.48 [Tim Wilson]: You have freaking BHD in statistics.
00:41:53.76 [Chelsea Parlett-Pelleriti]: Yeah. I mean, didn’t help too much. It actually made it worse in some ways. I thought I knew so much about statistics back when I was learning the ANOVA. And now I go, uh-oh, I really know only a very little bit of statistics. But I will say that fear I think is a really good motivator to have the conversations we’re having about the assumptions of an ANOVA, what you’re actually getting out of an ANOVA. And I think that’s really important. And I will say, I had this thought when you were talking of we as statisticians or whoever it is who’s putting out all this material on ANOVA are not always good about talking about the real-world applications of these tools, for instance, you may often hear with linear regression with t-test with ANOVA’s, oh, it’s robust to violations of this assumption. That’s true, but we don’t really talk about that well. I think it can lead to this thing where it’s like, okay, technically, there’s an assumption of normality for t-test and for ANOVA, but we don’t really talk about Okay, what happens when you violate it? And that usually ends up going one of two ways, which is like people care way too much about that assumption. And they’re like, oh no, my, you know, Kolmogorov, Smirnov test is insignificant or whatever. And they care way too much about it when it is robust, the inferences you make are robust, or they go the opposite and people go, oh, it’s robust. I don’t care about it. You go, no, no, no, no, no. It’s robust to minor violations of this. And so I think it does make the waters really muddy. If you were trying to decide, am I going to use a t-test? Am I going to use an ANOVA? Am I going to use a nonparametric method to analyze my form? core group experiment model that I did. It makes it really hard to figure out what should you actually do, because it’s not always clearly communicated what the pitfalls are. So to validate your fears, you should be fearful, but also people aren’t really doing what they could do to help make it easy.
00:44:04.20 [Michael Helbling]: And with that, we probably need to start to wrap up. No. Wait, what were you going to say, Joy?
00:44:10.15 [Chelsea Parlett-Pelleriti]: I have so many questions. You should ask them. I do have something to share that I should have shared at the top, which is, as you know, as I’ve talked about a million times, I got my start in psychology. So while I don’t use ANOVA that much in my day-to-day life, I have a soft spot because it was in the intro stats classes that made me fall in love with statistics. I love it so much. My dog is named after. and a Nova. Her name is Nova, so she’s a analysis of variants, I guess.
00:44:40.14 [Tim Wilson]: Which I think we got after we stopped recording last time. So that’s Michael’s fault that we didn’t manage to insert that for you.
00:44:48.57 [Chelsea Parlett-Pelleriti]: So it was very apt for me to be the guest here because I love it so much.
00:44:55.46 [Julie Hoyer]: Are you actually going to let me ask the last question, Michael?
00:44:58.04 [Michael Helbling]: Well, I’ve got a lot of noise happening on my end, so yeah, go ahead.
00:45:03.64 [Julie Hoyer]: I just wanted to, and this is probably a little bit of a can of worms to be ending on.
00:45:09.09 [Tim Wilson]: You know what, Moee is not here, so you are just slipping right in.
00:45:12.69 [Julie Hoyer]: I have to take this honor and carry the baton for Moee. We talked a lot about covariates, which means it would be an encova. But then you talked about using an ennova, understanding the question that it’s actually answering is that variation is explainable by the category you chose somewhere across these categories or across the category. And then you said you can follow it up with responsible post-hoc analysis. And we never really talked about a little bit of covariates or post-hoc. Which way do you go? And just the way you were talking about using ANOVA’s in practice, do you tend to lean towards one of those options instead of just a pure ANOVA?
00:46:00.67 [Chelsea Parlett-Pelleriti]: Yeah. In complete transparency, I do not use these a ton in my daily life, but when I have mostly back in my psych research days, it’s not an either or. It’s a, what question am I asking? Because when you do a post-hoc test, what you’re usually doing is something like, okay, I had four campaigns, I want to know which two are different or like which ones are different. And so post hoc tests can help you answer that, but you still could have like covariates in that that are soaking up that variance. So it’s sort of a separate question of like, do I want to include covariates to partition that variance as a novice or want to do or not or and do I care about these pairwise comparisons? Like if I have more than two groups, Do I care which ones are different? And honestly, I’m sure there are some out there, but I really struggle to think of a question where you… be better to use the omnibus F test that there is some difference somewhere in here versus most people have questions and most people are going to action on those post-hoc. I can’t imagine many scenarios where you’d want to do some type of ANOVA or in the ANOVA family and not want to follow that up with post-hoc tests. Some might argue, you should just start with those postdoc tests and control your family-wise error rate. But in any case, I think it’s a very important part of like actually gaining actionable insight from the Innova.
00:47:39.37 [Tim Wilson]: Gotcha. That does seem like that’s kind of the weird. Also back when I was trying to get some intuition around it and I found myself going down the, and then you’ll need to do a postdoc and the 2K postdoc is the most common. And I feel like I wound up in the the same spot like if you’re always gonna post hoc just feels like you’re like, ah, I did this thing. And then I’ll kind of do this other thing. You’re like, well, if you’re always going to do that other thing, it’s somehow it has this Latin phrase on it as though it’s like this kind of incidental tack on, but you’re almost always going to use it does feel kind of, weird.
00:48:15.06 [Julie Hoyer]: I have to do the ANOVA before you do the push post. Now this is the can of worms.
00:48:20.79 [Chelsea Parlett-Pelleriti]: This is like what I was taught. This is what I was taught is sort of like you do the omnibus test. And if the omnibus test is significant, it tells you something’s going on in there. So you throw it out. I don’t know that that’s widely agreed upon as the appropriate way to control your error rate. In fact, I might be wrong, but I feel like I’ve heard that might be overly conservative, especially if you’re also correcting for your family-wise error rate in your post-talk test. So I would say my current recommendation, ask me next time we talk about an OVAZ. My current recommendation is be thoughtful about the post-talk comparisons you’re doing. So if you don’t need to do all 10 groups compared to all of the other groups, don’t, and then use some type of like a Bonferroni correction, a Psyduck correction, the Tukey HSD that Tim was talking about, and just correct for your family-wise error, right? There’s lots of arguments about what counts as a family and what you should correct for, but we’ll save that for the next episode I’m on.
00:49:26.82 [Michael Helbling]: That’s right. That’s for analytics power hour after hours.
00:49:31.94 [Tim Wilson]: For the analytics hour plus listeners, they can get access to there.
00:49:36.07 [Michael Helbling]: All right. Well, before we start to fully wrap up, we do want to go around and share a last call, something that might be of interest. Chelsea, do you have something you’d like to share as a last call?
00:49:45.64 [Chelsea Parlett-Pelleriti]: I do. It’s a little out there, but it does relate to statistics and machine learning. You may have seen the movie Project Hail Mary is coming out soon based on one of the books that I thought was one of the best books I read years ago when I read it. It’s by the Andy Ware, I think is how you say that from The Martian. And it is not only an excellent book, and apparently might be an excellent movie with Ryan Gosling if you’re into that. But the reason I’m recommending it here that is sort of related to statistics is I actually read a section of this in my stats classes or my machine learning classes because they have this really beautiful scene. I don’t want to give any spoilers because it is quite a bit into the book where they’re doing something. Science won’t go into it. And they have this beautiful explanation of someone goes, did you use artificial intelligence to do this? And the person says, no, we have to be able to test it in thousands of ways and know exactly how it responds and why we can’t do that with a neural network. And I thought that was just such a great explanation in the context of the book. You’ll have to read the book of why machine learning and some of the black box methods can be a little tough to swallow for some people. So for both statistical and literary reasons, highly recommend both the book and the upcoming movie, Project Hail Mary.
00:51:10.92 [Tim Wilson]: Nice. I loved that book. My sister gave me that book for Christmas a couple of years ago. And I didn’t realize it was that he wrote the Martian until like after I was like, I gotta read something else by this guy. I was like, oh, he also wrote the Martian. So.
00:51:27.24 [Michael Helbling]: Nice. All right, Tim, what about you? What’s your last call?
00:51:30.96 [Tim Wilson]: So mine is a post. There are times where I feel like I’m going back to the same wells, but usually when Jason Packer writes something, it is entertaining and really thoughtful. And he, along with Yuliana Jackson, wrote a post called The Duality of ChatGPT. And the premise is kind of we We make these, there’s like two sides on multiple dimensions around discussing AI, like AI will write our code and do analyses for us and or AI produces slap and won’t make our jobs easier. And he just gets kind of thoughtful and has like, hilarious references slips in like a John Lennon reference that is actually just making a joke of a list. But it’s a good read where he walks around the duality interest to square the circle in each case. I’m hooked on people who are not completely in the bag for AI and are also not completely anti-AI and his was, I did actually, I physically grinned. I don’t know that I laughed out loud, but I was definitely smirking while reading it.
00:52:49.11 [Michael Helbling]: All right, Julie, what about you? What do you got?
00:52:53.10 [Julie Hoyer]: Mine is very off topic and just something I enjoy, not related to the industry literally at all, but I hope one of you listeners maybe are looking for this type of app and I hope you enjoy it as much as I have. It’s called The Short Ears and I had such a fear when I was having my daughter a few years ago. I was like, how the heck do people work a full-time job, have a child and keep up with a baby book? But I was also like, I want to remember these things. I want to have pictures. I want to do the baby book thing. So I was on the hunt for an answer to that problem. And the short ears has been amazing. It’s just an app on your phone and it can give you daily questions. And so as I would lay in bed at night, I could just go through and be like, oh, here are three questions for you lately about your kid. And you can upload photos, upload videos. And then as you finish chapters, they just mail them to you. You’ve bought and then you buy the book. But you don’t even have to buy the book or pay for anything before you start entering photos and information. So you could just go along and be like, Okay, I’ve really stuck with this. I’m six months in. I’m going to order the book now. They send you the chapters. You put them in the book and then you can even extend it to the toddler years, which I think I’m going to do. But again, I’ve just been able to like keep up in the app and then I can decide to purchase or not. And I also feel like then any subsequent kids, you can keep up with this so you don’t have like the first child got it all and the next kids got nothing. Like I feel like this could help. So if you have any anxiety about baby books, I really love the short years.
00:54:32.16 [Michael Helbling]: That’s just funny because I was literally going to be like, yeah, that second kid. Yeah, not so many details.
00:54:37.79 [Tim Wilson]: I mean, I had to go through as soon as you said like the short years, like the former D1 volleyball player, I was like, and what were the short years for you, Julie? Was that like zero to 18 months at which point? So.
00:54:54.89 [Michael Helbling]: Well, my last call is, I was going to ask. OK, please. Michael, what’s your last call? So glad you asked. Mine is also AI related, because it seems like it’s dominating everything we do. But Anthropic ran a little experiment recently with an AI agent that they put in charge of a little shop in their office. And they basically gave it instructions to try to buy things and sell things to the people that worked in the office that would help it make money. And then they wrote up the results. It did a terrible job. And it’s kind of cute what it was trying to do and kind of funny. But it just goes to show you that the level of complexity we can achieve with AI agents is not quite ready to replace us all yet. But it’s kind of a fun read. It kind of dives into a little bit of the details of what the AI was trying to accomplish and things like that and where it went wrong and what it did right. And they’re going to keep running that experiment. I think they’re working with an AI safety company as well all on that project. So kind of interesting. OK, Chelsea, who knew that a TikTok about Moente Carlo simulations would lead to all this? The one thing that we can say TikTok was good for way back in the day, but thank you so much. It’s incredible. I don’t know why this is, but statistics, statistical concepts, Well, feel hard to grab onto you for us, Mayor Moertals. And you’re a very unique and special person. And I hope people recognize that all the time in the way that you’re able to like bring those concepts to life. So I just really want to say thank you very much.
00:56:52.10 [Chelsea Parlett-Pelleriti]: Thank you. You’re thanking me by continuing to have me back on over and over and over at your podcast.
00:56:59.05 [Michael Helbling]: Yeah, we’re going to see. Secretly, we’re just going to shift the whole show like over just be like, oh yeah, now here’s the statistical power hour.
00:57:10.28 [Chelsea Parlett-Pelleriti]: Statistical significance and why you shouldn’t ignore non statistically significant lift tests.
00:57:16.61 [Julie Hoyer]: Ooh, that sounds so good.
00:57:18.99 [Michael Helbling]: That sounds great. Give us 10 episodes.
00:57:22.37 [Tim Wilson]: Look, I’ve been on three times. I’m gonna tell you fuckers what you really need to talk about.
00:57:27.15 [Chelsea Parlett-Pelleriti]: Exactly.
00:57:27.93 [Tim Wilson]: Listen.
00:57:28.27 [Chelsea Parlett-Pelleriti]: I just don’t want to talk about it anymore.
00:57:32.40 [Michael Helbling]: No, that’s totally fair. Well, and this whole show came about because our listeners wanted more topics like that. And as you’re listening, maybe there are other things you’re like, please bring back Chelsea to talk about this. We’d love to hear from you. So reach out, let us know. And you can do that on our LinkedIn or on the Measures Slack chat group or on via email at contact at analyticshour.io. So we’d love to hear from you. Yeah, I mean, this is awesome, really great, and I guess we’re going to ramp this up. And I think I speak for both of my co-hosts, whether it’s Sinova, a t-test, or an ANCOVA. Keep analyzing.
00:58:14.70 [Announcer]: Thanks for listening. Let’s keep the conversation going with your comments, suggestions, and questions on Twitter at @analyticshour on the web at analyticshour.io, our LinkedIn group, and the Measure Chat Slack group. Music for the podcast by Josh Grohurst.
00:58:32.67 [Charles Barkley]: Those smart guys wanted to fit in, so they made up a term called analytics. Analytics don’t work. Do the analytics say go for it, no matter who’s going for it? So if you and I were on the field, the analytics say go for it. It’s the stupidest, laziest, lamest thing I’ve ever heard for reasoning in competition.
00:58:53.97 [Chelsea Parlett-Pelleriti]: I don’t think you ever said the title, ANOVA.
00:58:56.84 [Michael Helbling]: ANOVA?
00:58:57.46 [Chelsea Parlett-Pelleriti]: I’m hardly knowing it.
00:58:58.34 [Michael Helbling]: Yeah, the title is sort of like a thing we don’t actually say. And Ian, sometimes guests have come on and then they just talk about something completely off the wall. And then we change the title again. Not off the wall, but like completely different action.
00:59:15.76 [Tim Wilson]: That’s an option?
00:59:16.82 [Chelsea Parlett-Pelleriti]: What the hell? I didn’t know I could go without it. Wait, I can go rogue and you’ll just completely reconfigure? So it looks intentional? Good to know.
00:59:26.24 [Michael Helbling]: Yes, we will. Absolutely. Basically, the process of the show, Chelsea, just so you understand it, is Tim comes up with the titles and then I just say whatever I want in the intro.
00:59:36.26 [Julie Hoyer]: And then Tim reconfigures the titles if they don’t match up.
00:59:39.55 [Michael Helbling]: Yeah, and if they’re totally different, like, I just go off into left field, Tim will be like, let’s make the title. First off, there’s a committee for this, Julie, so now I’m just kidding.
00:59:52.88 [Tim Wilson]: It’s like a really poor implementation, because it can’t complete a sentence and kind of goes all over the place and doesn’t make sense. Or it’s like, no, that’s like the best that’s ever made.
01:00:01.22 [Julie Hoyer]: Representation.
01:00:01.66 [Michael Helbling]: That’s right, that’s like presentation. Got Tim to a team.
01:00:10.80 [Tim Wilson]: Rock flag and it’s linear models all the way down.
01:00:16.92 [Michael Helbling]: That’s right. I saved that one for you, Tim, because I had an inkling you might do that.
01:00:24.31 [Chelsea Parlett-Pelleriti]: That’s all I had. I love that choice. It’s so good.
Subscribe: RSS