#201: Getting to Clarity About (Statistical) Uncertainty with Dr. Rebecca Goldin

Published: Sep 6, 2022

Subscribe: RSS

Subscribe: Apple Podcasts | Google Podcasts | RSS

0 Shares

Our podcast junkie co-host heard the following statement on another podcast a while back when he was out for a jog: “I actually think the word ‘uncertainty’ is used in English in a very different way than the word ‘uncertainty’ is used in statistics.” He almost ran into a tree (causation is unclear: he’s not known for his gross motor skills, which may have been a confounder). Not only is that quote, essentially, the theme for this episode, but the person who said it, Dr. Rebecca Goldin from George Mason University, was our guest! And we are absolutely CERTAIN that it was every bit as enlightening a discussion as it was a fun one!

Blogs and Books Mentioned in the Show

Photo by ZSun Fu on Unsplash

Episode Transcript

[music]

0:00:05.9 Announcer: Welcome to the Analytics Power Hour. Analytics topics covered, conversationally and sometimes with explicit language. Here are your hosts, Moe, Michael and Tim.

[music]

0:00:22.0 Michael Helbling: Hello, everyone, welcome to the Analytics Power Hour. This is episode 201. Probability, it’s conceptually, somewhat easy to understand, but it’s really freaking hard to actually leverage it in practice. Why do our mind seek certainty, and yet the world of data is always cloaked in uncertainty. It’s a real puzzle and it shows up in the work of data people all the time. Do you think it’s an accident that so many analytical people struggle with impostor syndrome while business executives seem to have an almost painful confidence in their ideas with sometimes seeming no supporting evidence. The more you know about how statistics work, the more you understand that uncertainty is at the heart of it. Honestly, it’s at this point that I sort of wish my life and chosen profession, were just maybe a little simpler. But here we are, and we have a podcast to do, so let’s do this. Tim, are you ready once more to step into the uncertain space between reality and the numbers that represent it?

0:01:24.9 Tim Wilson: I’m honestly not sure.

0:01:27.0 MH: Okay. [laughter] Moe, are you ready to get all statistical once again?

0:01:33.5 Moe Kiss: Sure, I tried to come up with a witty team joke but it didn’t fly.

[laughter]

0:01:37.9 MH: It’s like the energy, we’re all lacking [laughter] after being busy. Okay, so we have a decent knowledge, but not a great one. So we needed a guest, someone who could take the likelihood of success past the 95% confidence level. Dr. Rebecca Goldin is a Professor of Mathematics at George Mason University. She also helped formed and was the Director of STATS, which was a collaboration between the American Statistical Association and Sense about Science USA. They pursued a mission to improve how science, quantitative and mathematical issues are covered in the media. She’s published numerous papers on topics such as Schubert Calculus, Equivariant Cohomology and Symplectic Geometry. And even though [laughter] I have no idea what I just said, I am thrilled today she is our guest. Welcome to the show, Professor Goldin.

0:02:27.1 Dr. Rebecca Goldin: Thank you so much.

0:02:28.9 MH: So right off the bat, as I’m looking at your bio, obviously, you’re a very accomplished mathematician and statistician, which means you’re somewhat of a realist, and yet you have participated in trying to help the public become more educated about statistics, which means you’re an optimist. How do you take these two opposing forces and live in the real world? That’s sort of, I don’t know, [laughter] I might be over-supposing, but you know what I mean? It’s hard work.

0:02:58.0 DG: No. No, you have a point, right? The kind of math that I do is really theoretical and it’s a lot of game playing in a certain way. You get really sort of excited about the beauty of it, excited about the patterns that you’re seeing, but the kind of math that I do specifically is pretty far away from having a real world impact. And I think I always wanted to have some kind of impact on the world and you get that as a teacher. So I love teaching, but I also felt when I had the opportunity, I think it’s a long time, around 15 years ago, just first started working with STATS, this was way before it was part of the American Statistical Association. We were really interested in how people in general think about numbers, and it was actually a really exciting chance to have real world impact using math. So I think that’s what appealed to me when I first started, but it sort of, it matters and it matters at so many different levels, like kind of, there’s that introductory kind of level. Like there are people… Actually have number problems with percentages, right? And then there’s a more sophisticated level of confusion, like, what do we do when we’re… Have a lot of data and you need to analyze it?

0:04:05.6 MK: So tell me a little bit more about, I guess, this mission to improve like an understanding in the media. What does that look like? Is it about education or is it really about a partnership? From my perspective, I struggle to understand of like, when you’re trying to elevate someone else’s understanding of a particular topic, like how much of it is you doing the leading versus them? I guess having the interest, it seems like it’s probably tricky.

0:04:40.0 DG: Yeah, no, actually, that’s a really great question. And I think the organization itself took some time to try to answer that question. I think when we began, we were doing a lot of writing ourselves, so there was a sort of weird… Let me put it more bluntly than you did, ’cause you’re really polite, like sort of a conflict of interest between being like a journalist where you’re trying to say something with numbers and you’re trying to interpret the studies that you’ve read, the numbers that are floating around the ether out there. And you’re trying to put it all together versus just like what’s an honest interpretation of what that is, like what’s a fair representation of it without putting any story around it. Because once you put a story around it, you’ve got a different layer on it, and I think at some point we recognized that there was a real conflict of interest there.

0:05:28.5 DG: So we stopped doing so much writing and we did a lot more advising. And I think that we were a far more successful in doing that. So the answer to your question is actually to kind of separate those two processes. So the way it works in practice, we would have… My favorite kinds of things that we did were actually when we had journalists who would call us up and say, “Hey, I’ve got this question that I’m trying to answer or this study that I’m looking at and I want some advice.”

0:05:53.1 DG: And that advice could take the form of, “Well, you shouldn’t look at just one study, here are a couple of others, or it could be in the form of like, “Gee. What you really need is an expert in this really specific sub-field within statistics.” And maybe we’d put them in touch with somebody, or if it was me directly talking with them, because I felt like I had the expertise to talk with them, it was really taking a look at, like, “What are the kinds of things that are being concluded in the community of researchers on the topic? And why were they making those conclusions? And where were their data coming from? How do they interpret the data?” And trying to get the journalist to think about the numbers the way that somebody who is trained in statistics or in data science might look at them.

0:06:34.5 DG: A lot of times we’ll get calls from journalists where they’ve collected a lot of data, like maybe they did a FOIA request, the Freedom of Information request, and they have tons of information, tons of numbers and they don’t actually know how to interpret them at all, they’re not really sure, maybe they should take an average of these numbers, or maybe they should take a weighted average, or maybe they should do this or that, and trying to kinda figure out what would be a correct thing to do that would at least not get them in trouble, for making really big, bad errors.

0:07:05.1 TW: How did you actually get from… Is it fair to say that math is… Like theoretical math, it’s inherently deterministic, and statistics is inherently probabilistic, and I’m embarrassed to say that it was only like six years ago when my son kinda got dinged on a college entry thing because he took statistics one year, ’cause some other math wasn’t available and it was like, “Well, if you wanna go to this engineering school, you didn’t even take a math.” And I was like, “What do you mean? It was statistics.” and I was like, “I’ve been kind of in the field and I didn’t understand that. And you’re someone who went from being really into the math, and I think I even heard you speaking, it might have been on a podcast or somewhere where you said it’s weird ’cause I have a math background but I… Like how did you get yourself sucked into the statistics world and am I framing them as being that distinct from each other?

0:08:11.9 DG: Right. Well, I guess when you first start studying statistics, it’s usually is in a math department, so I don’t know what your son’s experience was, but a lot of times engineering schools wanna see calculus as opposed to AP statistics or something like this, so that might be what you’re speaking to, which is really about higher ed decisions, and maybe we shouldn’t go there so much as to how valid that decision is, but in terms of my own experience…

0:08:38.2 TW: Just a bone to pick. He did get into the school, and it’s a school that you have your advanced degree from, so it worked out okay, but it was that school, so a little irritated with them, but it was… But anyway, back to the topic.

0:08:50.2 DG: Well, actually, you guys should totally do a podcast on this fascinating question of what’s going on with high school math and higher ed and how they’re perceiving these math courses, because, so much of high school math is centered around what they think colleges want, and there’s a lot of push to try to change what colleges are doing. So that’s a whole other conversation, but… Okay.

0:09:14.1 TW: Okay. Well, when will we have you back. Yeah.

[chuckle]

0:09:16.1 DG: Yeah, I don’t know if I’m the expert, but certainly I find it a fascinating topic. So… Okay, so when you first started studying statistics, for sure, there’s a lot of math involved. If you’re like, “Okay, I’m gonna study probability.” The typical thing, and it often is, even in elementary school, we do kind of a basic probability, like, “Wait a second, if I’ve got an urn, and the urn has 20 blue balls, and five red balls, and now you’re gonna pick a ball at random, what’s your chance of getting a red ball?” Right? So you’ve got these kinds of probabilistic things that are considered really theoretical. Now you might then in an elementary school class, even collect data on this and try to connect the data like, “Well, when you pick out a particular ball, it might be red and it might be blue, and you’re gonna get some data when you have a whole bunch of people picking out a ball, each time putting it back and how likely… How close are we to getting that theoretical probability?” But, we even don’t make really the distinction in an introductory course between data and the likelihood of getting a particular outcome, with the theoretical thing that in fact, we theoretically conclude that we would have a chance of one in five to get a red ball because there’re 25 balls and five of them are red.

0:10:31.8 DG: So we kind of, I think, conflate these two things, but I think the first introduction you get to statistics is often quite theoretical, it’s mathematical, it’s like we’ve got a probability, and we’re gonna calculate other probabilities, and we’re gonna combine these probabilities, you need to know how to do fractions to do this, and there’s a whole lot of stuff that’s involved with discrete probability that really is mathematical, right? But once you start getting into data, we really recognize how much this is not a deterministic process, and the first kind of advanced ideas that we learn about probabilities and in a statistics course is really about uncertainty, and it’s about sampling error, and other kinds of things that come into play by which we really understand it isn’t actually deterministic.

0:11:20.1 DG: But there’s a kind of weird mix. Like, even if you think about walking into casino and you’re like, “I’m gonna get… ” Of course, you don’t know what the outcome is gonna be. You might have your hopes, of course your dream is that you’re gonna win [laughter] big. But I’ll tell you for sure the casino knows. They know how much they’re gonna make over time, and even presuming it’s an honest casino, there’s nothing… There is a certainty. With certainty, they know that they’re gonna make a lot of money, right, they may not know exactly how much money they’re gonna make given how much is bet, but they have a certain kind of… They have the law of large numbers really helping them out. So they actually are gonna get really close to that theoretical probability, right so, whereas as an individual can’t, we can’t estimate we’re gonna get to the theoretical one, and if we thought we were, we wouldn’t go into the casino, ’cause it’s a losing deal. Right? So [laughter]

0:12:10.7 MK: I know that Tim has a thousand questions off of what you’ve said, so I will hand over to him. But one thing I do wanna hone in on a little bit is you mentioned that when we start learning about probability, it’s very theoretical, I just… In my head I immediately wonder, is that the right way around. Do you have to start with the theory, or is that maybe something that is problematic in the way that we’re learning about maths and stats, which makes it may be less applicable or harder for children and then the wider population to understand?

0:12:47.2 DG: No, it’s a great question. I don’t actually know… Maybe in elementary school they actually do start with data and then kinda come to the theoretical probability, but I guess what I really need to say is the two things are kinda conflated. There’s not really a distinction made between flipping a coin and half the time it will come up heads in theory, or flipping the coin and you got heads this time and we’re gonna do some data on it.

0:13:15.0 DG: That distinction, it’s right there, it’s evident, but they’ll still say the probability was one half… They won’t really say, “Well, you got six out of 10 were head so therefore, the probability is 0.6.” The assumptions that those two outcomes will eventually be the same is right there and not really talked about, I’d say in an introductory course. So the reason I was bringing that up was more to kinda point to the extent to which the math of it, which is what the… After you collect your data, if you’re going to first collect your data, you’re gonna kinda move to that theoretical thing and then say questions like, “What happens if we flip two coins. What’s our probability of getting two heads? And that becomes theoretical. You move away from the data pretty quickly and start learning these basic ideas of probability that I think are… Fit in the world of what I’d call theory.

0:14:08.6 DG: So that’s why typically speaking, if you’re teaching statistics you’ll be in a math department, if you’re in a high school level. So there are also statistics departments, of course, at universities, especially. But they’re… Sometimes statistics is wrapped up into math and statistics is used in every single field. So you’ll find statistics courses in psychology departments, you’ll find them in the nursing school and then you’ll find another version in the economics department, I mean everyone’s teaching their own statistics even if many times it’s the same idea. But pretty quickly in the field of statistics the reality of data and the reality of what it is you’re applying it to becomes so specialized that we tend to move away from like, “Oh, there’s this theoretical framework that encompasses everything. We’re just gonna… ”

0:14:57.2 DG: We splash it into our individual departments because it becomes very much about data. And I think that’s one of things that people often misunderstand maybe about statistics is like it actually is really hard to do if you don’t have any familiarity with the data that you’re talking about. Like you can’t just say, “I’ve got data, I’m gonna do this theoretical thing, and it always works.” It doesn’t. And it’s because there is really domain expertise involved in any data.

0:15:24.1 DG: And I think that’s kind of like the profound difference between statistics as I see it and mainly, sort of the data science view of statistics, let’s say it that way. Like where we’ve got massive amounts of data and we’re trying to understand it versus mathematics. But there’s lots of theoretical statistics too, which are very, very mathematical. There’s graduate courses in understanding the underlying mathematics to statistics, but I think that’s not necessarily kind of the audience that we’re talking about when we say like, “What’s the difference between them?” So I’m really kinda moving towards how do we describe data and using statistics to describe data as opposed to using mathematics, but there’s overlap of course.

0:16:04.6 TW: But I’m fascinated… I hadn’t thought about using the red and blue balls in an urn. And like… Okay, we talk about an urn that has 100 balls in it, and we’re drawing them out and mapping that. If I’m talking to a marketer and saying, “Imagine that urn is… ” We don’t know how many… We actually don’t know how many balls are in it, and we just know we’re gonna be able to pull 10 balls out of it. And that’s what’s happening when you are looking at the data for your website. The data… The website existed for the last 20 years and it’s gonna exist for the next 10 years, and you’re just pulling some traffic out of that to try to… ‘Cause it still feels like the challenge is once there’s really precise numbers, there is this idea that we’ve got large numbers. Maybe that’s the way…

0:17:02.8 TW: Like the casino has a pretty good idea of what they’re gonna get ’cause they’ve got the law of large numbers and statistics works in their favor. Very, very small variants on what they’re going to see. We then turn into the business world and say, “Well, I have 10 million rows of data, so clearly I am the casino.” And it’s like, “Well, no, you have to know what those 10 million rows are representing.” That gets back, I think, to your point of saying you actually have to understand what the data is and the theoretical underpinning.

0:17:34.1 DG: Yeah, and I think if you’re trying to evaluate your website and you’re like, “Okay… ” If you knew that you had some static thing… Like if we took the urn analogy and we said, “We just don’t know what proportion of the balls are red inside of this urn.” And you are trying to kind of figure that out. That sounds like the kind of contrived questions that you get on the AP statistics exam. But the kind of reality of you’ve got a website and you’re gonna pull out some pieces of data on that. And you can make all sorts of calculations, but the problem that you have is that your website is a dynamic thing in a dynamic universe. And as time goes, it’s not going… The world doesn’t stay the same, right. So it’s not like you can say, “Well… ” You’ve got this sample that was a specific fixed thing you sampled from and now you’re gonna make an inference about the larger population. So your tools are a little bit limited.

0:18:30.9 TW: Well, I struggle to sometimes define what is the population. It generally, you’re kind of… Whether they realize it or not, they’re in causal inference land and their population is really the future visitors to the website. And that’s kind of, I think, definitionally, not a random sample ’cause you’re pulling data from now then you’re using that to represent a population that’s out into the future. And this is why marketers don’t like to talk to me, but…

0:19:03.0 DG: But…

0:19:03.3 MH: I don’t know if that’s the only reason, Tim.

[laughter]

0:19:06.4 MH: I’m sorry, you were saying.

0:19:09.8 DG: Yeah, no, I agree, I think that’s a really hard… That’s a really hard question, like, How are you gonna predict, but at the same time, it does seem like if you’re a marketer the best you can do is try to understand who your clientele are right now, so it’s not impossible to put confidence in that.

0:19:31.2 MH: That’s great, so shifting gears a little bit. One thing analytics people run into all the time is working within environments or with co-workers and leaders who in the world of business don’t have… I don’t wanna say don’t have a respect for, but they certainly don’t have an understanding of statistics and the inherent uncertainty behind the data, and so communicating that is actually really challenging, and I think this is actually one of the reasons why we were so excited to talk to you, Rebecca, because I feel like this is an area where you’ve really put some focus and really some thought so I’d love to get some of your thoughts on some of those pieces, and I know you do this, you’re sort of at the policy level, or more like a bigger level than just like a single business, but I think there’s applicability, so… Yeah, maybe that’s a way we can… I’d love to hear from you.

0:20:22.9 DG: Yeah, for sure. So I think that there are two distinct issues, one is how to communicate what uncertainty is, what do we mean by the word uncertainty, and a different issue is how to communicate how much uncertainty you have or… What’s the quantified thing? And we were talking about this earlier, but the idea of uncertainty, like when we speak in English and we say I’m uncertain, it means you don’t know, like you’re just guessing or it’s just like there’s sort of a no-knowledge thing going on there when you say that something is uncertain, but in statistics, it really has a specific meaning, and I think even within statistics, sometimes we use the word uncertainty to mean more than just specific kinds of uncertainty, we’ll often talk about where… We might talk about whether something is uncertain, whether we should use a model, and that’s the English meaning of uncertainty, which is to say that we have ideas around it, but we’re not really certain of the answer versus uncertainty speaking to say, I’ve got a confidence interval or I’ve got a p-value. To interpret those statistical numbers to mean something that can used by somebody in making a decision. [laughter] Right? So…

0:21:44.7 MK: Out of curiosity, Rebecca, do you think that you need to teach your stakeholders both, do they need to understand the concept of uncertainty well, to then understand when you’re sharing specific values around uncertainty. Do you think the two go hand-in-hand? Or do you think you can do the latter without them necessarily… Like do you see where I’m going?

0:22:06.6 DG: I think it’s really, really hard to talk about p-values if they don’t appreciate that any sort of uncertainty that you’re talking about has a specific kind of meaning. So I guess my answer is a little bit, “I’m trying to avoid your question.” Because I would say it’s really super hard, you can even find websites that claim to be telling you what a p-value is and actually getting it wrong, and they’re… Like… People think they know and they don’t know. And actually, sometimes people have working definitions that are like… They’re technically incorrect, but they work well enough, so if somebody says, Well, the p-value is like the likelihood that my result is untrue.” That would be incorrect in a statistics class, that’s not what it is, but if you think of it that way, you’re not gonna go whole hog wrong about what you’re kind of concluding.

0:23:03.3 DG: Right, so there’s some value, let’s say in some ways that we think that aren’t correct technically, so I don’t feel like there’s some huge mission in my life to make sure everybody knows the exact meaning of a p-value, no, but I think it’s very valuable for people to understand something like the following statement, when I speak about uncertainty, it doesn’t mean that we don’t know. Okay, that’s something that’s really important. Uncertainty is a quantification of how much we don’t know or how much is coming about because of certain kinds of variability, or like that the world is a stochastic place, it’s not deterministic. So we can quantify that, and that’s what uncertainty speaks to, and I think that people can appreciate that very much in the same way that if you say, “Look, I’ve got a coin and the theoretical outcome is one half heads and one half tails, but if I’m gonna flip it 10 times, I might not get five heads and five tails.” And people are like, “Yeah, of course.” Like they get that right away. Right, so that description, I think it does really speak to people to say, “Look, in the same way, if you collect information about how well your medicine works, how much people are experiencing long COVID symptoms, how much… What your website visitors, what their income brackets are.” Whatever it is information that you’re collecting…

0:24:21.8 DG: You can have some variability in what happens, that might mean that your results don’t reflect exactly what would happen if you were to do it again, right, and it’s not always going to be the same, and it doesn’t mean that something is wildly off that you get slightly different results from time to time that you would run, conduct some experiment. So I think there’s some value in people understanding that for sure. And another way to put a spin on the question you just asked, my friend, I think it’s a great question. If I were to tell you, “Okay, here’s the… ” You did some sample, you wanna know like what the… Who supports which candidate for President or something like this, and I say look, candidate A, candidate B, candidate A had 54% support in the assembly, here’s your margin of error and I give you a margin of error, and that’s a measurement of how confident we are if we’re gonna do it again. What’s the… If I were going to run it again and get another confidence interval. I do it 95 times out of 100, and this is how often I would capture that true percentage, like if you were to then get that, how meaningful would it be for the world that you get it.

0:25:25.5 DG: Well, probably if you can’t then turn it around and say it to everybody else, and when you turn it around and say it to everybody else, they don’t understand it… Well, anyway, the answer is, it’s not useful at all. Right, so in a way, sometimes when somebody has a kind of slightly false understanding, but everybody can sort of wrap their heads around it, which is to say, “Look, if I tell you it’s 54%, but my margin of error is plus or minus 3 percentage points that you understand… ”

0:25:50.9 DG: Look, that says, really, candidate A is ahead, ’cause it’s 51-57%. And you got in your mind that somewhere in there is the true value. And that’s good enough. It doesn’t have to be that you have a very fine knowledge of it. And that will probably communicate way better. So I think that there is a lot of value if you’re trying to sell to someone else, the idea that it’s valuable to talk about uncertainty. You need to explain it as something that is quantitative, not something that’s uncertain. We can be certain that this is our uncertainty. [chuckle] We can say our confidence intervals and other words that we might use.

0:26:30.1 TW: Well, or… Is there not a role for… I think there is a role for the visualization, because it seems like… I mean, I… If I put error bars on a chart, I… Generally, people sort of intuit, they understand what it is. In the Presidential… A poll, there’s… Part of me says, Instead of putting 54% with the margin of error of plus or minus 3%, if we represented it as 51-57% and never even showed the actual observed value, I feel like there’s some… There can be intuition. Or if I’m looking at a historical data, like time series data, and I put confidence bands… Represent the variability of it, it does feel like people understand variability, when if we go to confidence intervals or, say, standard error or variants, I think it does scare people. But I wonder if that’s kind of an opportunity for analysts to say, when you’re visualizing data, how can you visualize the uncertainty, and then you’re not saying, I’m representing the uncertainty, you’re saying, I’m giving you a range of which I have a degree of confidence that the truth falls within that, maybe?

0:27:51.7 DG: Yeah. I think that those kinda visual representations are fantastic. So I love the visualization thing. I think visualization is always really helpful for people. And I also think that it’s valuable to recognize that there’s a lot of uncertainty that isn’t in specifically the kind of uncertainty that we try to quantify in statistics. So one of the things that… It depends on what thing we’re talking about. But a lot of times the data itself, the way that we collect it, may have uncertainty in that. And I think it’s valuable to kind of point out to people that there are other sources of uncertainty when you’re trying to quantify something. We can’t always know all of the ways that we don’t know something. [chuckle]

0:28:43.0 DG: And let me be specific. Suppose you’re doing research on how well a particular technique of teaching mathematics works. And you go about your experiment and you see that the test scores go up after teaching in this way. And maybe you even have a control for a group of students who were not taught in the same way, and you see some benefit. And you could then describe that, you would describe your increased test scores, and then you might talk about some kind of measurement of uncertainty for the fact that you’re trying to capture the idea that maybe if this new teaching method really didn’t do anything, we might still see a difference between these two groups of students, the ones who got the new teaching method, ones who got the old teaching method, and we might still see a difference, just because, well, when we take a random group of students, we might have higher test scores. And so you try to describe that. But there’s a huge amount of other kinds of error that is introduced. By error here, I wanna be very careful to say, statistical error, not human error. Okay? [chuckle]

0:29:51.4 DG: So, for example, we’ve got measurement error that comes about like, Does the test measure what it is that you want to measure? Is there a virus that promulgated in one of the two classrooms that… [chuckle] Are there differences in these students in terms of who’s getting help at home, and whether the teaching technique that you used resonates well with the parents who are likely to help their kids? Or like, What about whether a teacher is better or not in this one classroom or another? Are there income differences? You can go on and on about the kinds of things that you want to kind of bring in that make that research really complicated and difficult. So we can only quantify a certain amount of it, but ultimately speaking, I think it’s actually very valuable for people to understand the context in which you’re doing some research.

0:30:42.1 DG: So while the bands are really helpful, one of the things that I often worry about is being overly certain in our description of uncertainty, where we kind of go about saying like, “Look, this is the only range that it could be, based on a certain level… ” There’s certain kinds of uncertainty that you’re describing with that, which has to do with sampling. And that’s really important kind of uncertainty. But there are other ways in which we don’t know exactly, but we can quantify the extent to which we don’t know, or we can at least call it out to say, “These are things that might have impacted those results.” And I actually feel that it’s very useful to understand those so that we don’t get to a place as a society where we’re just rigidly saying, “It’s been proven… It’s been proven… ” It’s hardly proven proven, right? [chuckle] And that’s probably more relevant for social outcomes than it is for, say, a business decision where, ultimately, you just have to make a decision. So you can only do the best you can.

0:31:43.2 TW: Just to… I just wanna make… Were you completely dancing around… Is confounding… Part of what you were describing was that just confounding, just to put the word on it, that that can be what’s going on, unobserved confounders or unidentified cofounders?

0:32:00.1 DG: Right. So some of it is unidentified confounders. Sometimes, though, the… Confounding factors are sometimes things that people have identified and don’t know how to adjust for. And sometimes they’re things that are actually unknowable. So compounding factors are often listed sort of like things you could know, but you don’t, but sometimes you actually can’t know. And I could give you an example of one that’s kind of curious and weird, like. But maybe that’s not… Maybe not what you want, but anyway. Yeah, I think that there’s…

0:32:29.2 TW: Oh, wait. Did you say a weird good example? That’s… Please, do tell.

0:32:35.0 DG: Okay. Here is a weird good example. This is from research a long time ago and I haven’t looked at the research in the past few years, so things might have been different, but there was this research that was on whether breastfeeding was particularly important for a babies health. And it was specifically looking at, if a babies were put on formula very early in their lives versus if they’re breastfed, what would be the increased risk of death? So you’ll see lots of articles that will argue, scientific articles that will argue that it’s better for babies if they’re breastfed. And I don’t think that’s particularly controversial, but there are questions of like how much better and what the cost is to the mother who’s going to breastfeed that baby or not breastfeed the baby.

0:33:19.2 DG: And so, people try to analyze to what extent, how much better is it? And this is an article in that vein. And it was about how the increased risk to death, they had a whole bunch of different categories of kinds of ways that babies might die and trying to compare the breastfeeding to not breastfeeding. And this had… It was a meta analysis looking at a lot of different studies kind of put together and the most impressive and only statistically significant method by which a child might die more likely if they were not breastfed versus if they were breastfed, was death due to injury. So this is a very strange sort of result because the first thing you ask, like, what’s the mechanism? How could it be that formula would make a child more prone to accidents?

0:34:07.9 DG: And this is a situation where, of course, people theorize, well, maybe mothers who don’t breastfeed are less likely to bond with a child and to care for the child or maybe mothers who are not breastfeeding, but the fact that they have a bottle instead of a breast means that they’re not holding the baby as close and that is causing more accidents or that caretakers are more careless with the children than the mothers. People went on and on and I think there’s kind of an obvious question here, which is like, there might be other confounding factors like child abuse and we will not know. We could not know that sort of thing if that’s what’s going on mechanistically, and that could well be that mothers who are abusing their children or in abusive households are also less likely to breastfeed for reasons made. Perhaps the women themselves are being abused then they find it is too much to also do this or who knows what’s right? So these are the kinds of things where you’re like, “What’s going on behind that?” We won’t know. We can only theorize, but I think it points to the importance of asking questions like, “Can we get experts in the room to talk about this data and to really understand the mechanisms as being essential to understanding that data?” Right. So that’s an example [chuckle] I’m certainly not an expert in this, but I do find it extremely… I found that particular result really stunning.

0:35:38.9 MK: So like that example, ’cause straightaway in my head, the whole time you were talking, I was like, “Is it randomized? Is it randomized?” ‘Cause often, studies particularly on women and babies and we’ve actually had Emily Oster on the show previously who writes a lot about this sort of stuff where often these studies are not randomized. And my understanding is like the whole reason that you do randomized controlled trials is to minimize confounding factors.

0:36:04.4 DG: Yeah, that’s a fair critique, but at the same time, you simply can’t do a randomized study on breastfeeding. You can’t tell women that they should or should not breastfeed their children. Right. So I feel like, either we say wholesale, we’re not going to be able to do any analysis of the relative benefits or we have to ask these really hard questions on observational data. And no study on… Nobody would have made the conclusions that smoking was unhealthy for you based on experimental data, which we assigned people to smoke and other people… You couldn’t do it, because there was such strong evidence that it was actually dangerous for people that it would have been unethical to assign them to smoke. So you can’t do it. So I do agree, of course, it’d be better, right? Sure. But there’s also a kind of reality. And when we’re dealing with observational studies, we have to kind of try to find a way to take into account certain things that are really hard to take into account. And in that situation, I think maybe you could design a study where you try to have parents who are all in. I really don’t know how you would do it. I almost feel like I would adjust for that problem and say, “Look, that’s not gonna be a cause of death that we’re going to consider.” Because there are other things like, say, leukemia, where you’re really looking at… You could imagine that there’s a mechanism or something like that.

0:37:34.0 MH: Alright. You know what time it is? It’s time for that quizzical query, the conundrum that stumps my two co-hosts. It’s the Conductrics quiz. The Analytics Power Hour and the Conductrics quiz are sponsored by Conductrics. They help companies build industry leading experimentation software for AB testing, adaptive optimization, predictive targeting. You can find out more at conductrics.com. Alright, Moe and Tim, are you ready?

0:38:01.3 MK: Yes, sir.

0:38:03.3 TW: I’m not sure. I am not sure. I am uncertain.

0:38:04.3 MH: Awesome.

0:38:04.4 MK: Oh Jeez.

0:38:05.2 MH: Yeah, uncertainty? Awesome. Well, Tim, you are representing a listener, and I’m sure of that, and their name is Jackie Malm. So do your best. And Moe, you’re representing listener Louis Christmas. So, good luck with them. Alright, here we go. I’m going to read you the question and the options and then we’ll see what your answers are. Alright, picture this. The Analytics Power Hour team is having lunch together before they do a live version of a podcast, say at an analytic conference in Las Vegas, like we did a couple of months ago. As they chat, Michael, perhaps wanting to show a bit of statistical sophistication, casually some might say too casually, casually mentions [chuckle] that for a recent AB test looking for an effect on revenue, he used a Mann-Whitney U test instead of a Welch’s t-test. “Well, why did you do that?” Tim asked. Michael replies. “Well, because the revenue data was skewed. So not normally distributed. A Mann-Whitney U is better than a t-test, isn’t it?” To which Moe replied, “uh, not so sure about that. Even forgetting about non-transitivity of Mann-Whitney U how did you report the treatment effect since the Mann-Whitney U isn’t a test of the raw treatment effect like the t-test. You would have to report something like the common language effect size, which the client almost certainly won’t understand.”

0:39:31.7 MH: And I said, “Oh, so what is the null hypothesis of a Mann-Whitney U test?” Is it A, the medians of two distributions are the same. B, the means of two distributions are the same. C, the probability that a draw from distribution A will be greater than a draw from distribution B equals 50%. D, the probability that the median value of distribution A is equal to the median value of distribution B or E, the probability that the top five deciles of distribution A has an average value greater than the average value over the top five deciles of distribution D.

0:40:13.5 TW: Good Lord.

0:40:16.6 MK: That is not where I thought this was gonna go once again.

[laughter]

0:40:19.2 TW: I’ve definitely, I have definitely heard of the Mann-Whitney test and I’ve definitely heard people ask… But I’m gonna go with an elimination of A, because I don’t understand the median of a distribution just feels weird. I don’t…

0:40:36.4 MH: It feels too open.

0:40:38.0 TW: It just feels like a distribution I, yeah. I’m gonna try to, I’m gonna eliminate A.

0:40:43.7 MH: You’re taking a risk and it pays off.

0:40:47.5 TW: Oh, man. Moe didn’t even have a chance to…

0:40:48.9 MK: Well, I can’t bandwagon on an elimination.

0:40:50.8 TW: Oh, yes you can.

0:40:51.3 MH: Yeah. The bandwagon is for, if you think you have the right answer, then you jump on it. But A yeah, it can’t be, the medians of two distributions are the same, that’s not correct.

0:41:00.4 MK: I was going to try and eliminate B.

0:41:03.0 MH: B, the means of two distributions are the same. Okay. Well, guess what, we are increasing our chances of a probability that either of you could still be the winners, because B is also not correct. So now we have C, D and E as possible options.

0:41:20.7 TW: So at the risk of being, rolling the dice twice and hoping for a good result, I’m gonna eliminate, I’m just not liking the median, and that may be the core thing of the Mann-Whitney test, but I wanna eliminate D the, probability that the median value of distribution A is equal to the median value of distribution B.

0:41:39.8 MH: Okay. Well, your, well, I don’t know what the right word is, but your attempt to eliminate medians is paying off. ‘Cause that is not the answer. So there you go. So now we have C the probability that a draw from distribution A, will be greater than a draw from distribution B equals 50% or E, the probability that the top five deciles of distribution A has an average value greater than the average value over the top five deciles of distribution B.

0:42:11.6 MK: So I thought it was C or E to start with, so this doesn’t help me.

0:42:15.5 MH: Oh, well, but [laughter] it’s a 50-50 at this point.

0:42:18.8 MK: A hundred percent. I’m really conflicted. Initially, I thought it was E but the deciles is actually now throwing me. So I’m gonna guess that the answer is C.

0:42:34.4 TW: I’m gonna bandwagon with that.

0:42:35.2 MH: Okay. Well, this is exciting. I think this is the first time Moe’s gave the guess and Tim has bandwagoned, but I wanna say something about that…

0:42:44.3 TW: We both lost.

[laughter]

0:42:44.5 MH: Because Moe, you are correct. [laughter] Yes. You are right and Tim because you bandwagon on like a genius. You’re also correct. Which means we’ve got two winners Luis and Jackie Malm.

0:42:54.7 TW: I think it was same. It was that whole deciles thing that Moe’s hesitation, I was like, yeah, that reinforces that.

0:43:01.0 MH: That sounded a little bit of sus to you. Well, the answer is C and good luck trying to communicate that and how to interpret standard effect sizes to your clients. Answer A, is often confused as the correct answer. So it was really brave of you, Tim, to eliminate that one, but one can imagine distributions of different shapes, such as the probability of selecting an element from B greater than an element drawn from A is greater than 50%, but the median of B is less than the median and value of A. For example, if A has the following elements, 0, 0, 0, 6, 6, 6, 6, it has a median of six. And if B has the following elements, 5, 5, 5, 5, 7, 7, 7, it has a median value of five. It draws from A only have a 32.65% chance of being greater than a draw from B, obviously. Alright.

0:43:49.3 TW: Obviously. [laughter]

0:43:50.4 MH: That wraps up the Conductrics quiz. What an incredible outcome. Great job both of you, winners both. And thank you to Conductrics for sponsoring the Analytics Power Hour and the Conductrics quiz. Let’s get back to the show.

0:44:05.4 MK: So I’ve gotta like, get my head around this because I feel like you’ve introduced some concepts to me today that I guess maybe I had considered, but not. I feel like you’ve kind of shone a light on it and made it clear. So like typically when we do run some kind of test or experiment, like I do understand confounding factors and all that sort of stuff, but I would probably go into it being like, yeah, this is in a level of uncertainty we have, but it sounds like you’re kind of saying there is this next step of, there is this other area of uncertainty of unquantifiable confounding factors that we also then need to consider. And then in some cases we should be sharing with our stakeholders.

0:44:52.9 TW: She’s thinking I’m an academia. You’re using that… You’re using those stakeholders language again.

0:45:00.4 DG: Yeah. I’m like, yeah, taken. [laughter]

0:45:01.0 MK: I know I’m thinking about like, how much do I tell my stakeholders?

0:45:04.4 TW: Well, but can I take maybe a slightly different run at it? ‘Cause I and Moe you brought up like Emily Oster and that’s like economics. And Rebecca, you said earlier, like there, you wind up with different sort of sub… Sub-specialties and statistics in my understanding is that I think of it in buckets of true randomized controlled trial experiments, as you just said, you can’t do that all the time. Economists, a lot of times, I think, are either looking for natural experiments where you have to be super careful and really think about what might you be missing to make sure that it’s truly a natural experiment or you… I think you wind up in the world of quasi-experiments. And that may be a specific area that… My understanding… It’s a little problematic term, because when I’ve used it internally at work, people are like, “Well, I don’t know that we can do quasi-experiments.” I’m like, “Well, I think I’m just talking about doing regression very thoughtfully.” Like it’s not actually an experiment. I’m trying to look at observed data and make it behave like an experiment by accounting for as many of those other factors as possible, because I can’t go and tell these women to smoke and not breastfeed and these other women to not smoke.

0:46:25.6 TW: Because I can’t do that. So I have to… I don’t know that… That I think is language maybe for more of social sciences and economics and econometrics statistics though… Now you’re like, “Yeah… I don’t know whether you’re asking a question either Tim. Michael, it’s your turn to ask a question. Ask a question.

0:46:46.4 DG: Yeah… No, I’m not sure if I… I guess I’d say is… I think these are really great. I’m thinking about how Moe, phrased the question. I think in the end, what you’re asking is how do we get from understanding that experimental things are hard to come by. We’ve gotta do a lot of observational kinds of studies. We can try to account for confounders, but there is a certain amount that’s unknowable and how responsible do we need to be in communicating that to people as a whole. And I think that part of my personal answer to that is that it’s really important when you’re talking about things that bring up a lot of emotions and a lot of really hard decisions that we are honest about what we don’t know. And we’re honest when we are confused by what data that we see. I personally feel that it’s irresponsible to publish a study on why breastfeeding is better than formula and use justification that there are children that die by injury from not breastfeeding, to suggest that the mechanism is that their mother isn’t holding them close. And I feel like that’s a bullshit sort of thing to say to justify post-facto this weird data that confuses you. Why don’t you just say, “We’re not sure of the mechanism of this.” And some theories include this or that.

0:48:17.3 DG: Because for one, if you pump your breast milk and hand it to a baby in a bottle it would be at risk of injury if the same way that would be feeding them formula… So that’s a wholesale shift from just saying that breastfeeding… Mother’s milk is better than formula or something like that. So I feel like there is a question of what are the stakes in terms of what you say. What are the implications of what you say. And this kind of gets to really hard questions about social stuff, and that would be a three-hour podcast.

0:48:55.6 DG: Like we could talk about women in mathematics, the data say this… What does the… Lawrence Summers, ex-president of Harvard. He at one point said, “Look at SAT scores. And look how women are scoring really high on the top end of SAT scores.” And was this supposed to mean something about who becomes professional mathematicians and what’s holding women back. And certainly a lot of people have felt that’s what he was trying to say. It’s an irresponsible use of data. It may not be technically incorrect to look at that tail and make an observation, but the implication socially are really huge such that it’s irresponsible to put it there because it doesn’t reflect at all the reality as to why it is that we see women not becoming mathematicians and engineers. It’s really hard to look at something pretend that it’s a normal curve when we know SAT scores have a top score, so it’s definitely not a normal distribution.

0:49:54.8 DG: And try to extract information about what happens 10 years further down the line when women aren’t getting PhDs as well as at the same rate as men in hard science fields and say that’s because there weren’t as many scores over 750 on the SAT. Seriously, the level at which this is sort of disconnected from reality is really huge. So I feel like… Look, it’s a hard answer to the question. I have a lot of opinions on it. But ultimately, we have to be somewhat responsible for the kind of information that we share publicly.

0:50:29.1 DG: And if you’re a business coming to try to make a decision should I invest in this or should I invest in that within my website, like those stakes to me feel kind of lower. They’re I’m sure very high for the business owner trying to make a decision, but you’re trying to do that decision in good faith. And sometimes I think our biases come in for how we interpret data. And I would prefer a lot more honesty about the things that we don’t know and the things that we do know.

0:50:51.4 DG: And I think that would create a little bit more trust because where we’re at right now we could have a whole conversation about how data’s really untrustworthy. And one of the things that I think that we’ve talked about a little bit touched on, but was mentioned before we got on this podcast is like how much… Real experts in data don’t respect data as much. And I agree with that statement. As soon as you know a lot about data, about specific set of data… The more you know about that specific of data, the more you have a right to call into question how good the data is. How higher quality it is.

0:51:26.6 DG: But at the same time, we all know that there are data that are higher qualities and other data. And we know that there are sources that are more trustworthy than other sources. And we know that if that data is generated by an interest group then we’re gonna get a biased view of it. We have all these sort of… On a superficial level understanding of that. But what we don’t know is how could we get to the place of creating trust over a really good data analysis that goes to the public questions. And I think that’s very hard, especially considering how information is spread about. But I don’t think it helps to kind of set it up with irresponsible conclusions based on things that are like, “Wait, why would we get this result?”… If you don’t know maybe it’s good to say, “We don’t know how to interpret this.” I don’t know if that answers what you’re…

0:52:18.8 MH: That was amazing.

0:52:22.5 DG: Okay.

[laughter]

0:52:23.1 MK: I know. I’m actually walking away, even just thinking about… So there’s a word at work at the moment called incremental that everyone is starting to really misuse, and I’m starting to think about it in the same way that you were talking about uncertainty earlier of like, when is misuse of it kind of “dangerous” and when is it… Okay, because it’s still achieving, I guess, a level of understanding to make the best decision, but anyway… You’ve left… My brain is firing on cylinders.

0:52:56.0 DG: No, that’s a great. Yeah, yeah no, that’s really great ’cause there are… I guess there are times where you can have your understanding be slightly wrong, but you’ll still make the right decision based on it, and I think that sort of it kinda gets back to what we were talking about before. Does it really matter when someone understands what statistical uncertainty is and that it’s very context-driven, isn’t it?

0:53:17.8 TW: Well, you said it very, very quickly, and I realized Microsoft is probably gonna be calling us out if we don’t actually say… You called out, kind of the… It does matter what the stakes are for the decision that you’re making a decision, and the stakes of the decision are a factor in how much uncertainty you can make that decision with, which is another just… It seems obvious, but boy it’s hard to get people thinking that those complement each other, if it’s a low stakes decision, flip a coin, maybe it doesn’t matter at all. If it’s really high stakes, what can you do to reduce that uncertainty?

0:54:00.7 MH: As analysts, we get stuck in this feeling of like, if we do this wrong, it’s really bad, and then it’s like, “Oh, maybe it’s not so bad.” If you’re listening… Maybe go back, take the last seven minutes, clip it, put it on your desktop, and every time you’re running into that kind of issue, just replay that, ’cause I think it’ll actually be very encouraging.

0:54:20.7 DG: But it kind of brings up one other issue that you guys haven’t mentioned yet, but I wanna bring it up because I expected to hear from you guys on this, but there is this notion of margin… I’m not trying to get the right word for it, but marginal benefit is the word I wanna use. So let’s go back to the casino. Right, you walk in the casino, you know it’s a losing game because you guys study things involving data statistics, you know that the house makes money when you buy a lottery ticket, you know it’s a losing proposition. Lotteries fund all sorts of things in any given state, and you know it’s a losing proposition, why would you bet on it? Right? Why would you bet knowing that it’s a losing battle on average, and I think that there is something really important about this question from the point of view of decision-making and other things that people are using analytics for, which is that if I were to win the lottery, my life would change for the good in like… I don’t actually play the lottery, but if I were, I would be thinking this way, right? It’s like, “If I won $2 million or $10 million, like, Boy, I could do this and do that, and my life would be really positive whereas if I lose $2, it actually doesn’t matter. Like, my life is the same.”

0:55:37.3 DG: And I think that that is… That that speaks to that benefit as small as it is, even if I can look at an expected value and my expected value is negative. I’m going to lose money on average, the fact is that I am not average, I am an individual, and we all think this way, right? So that’s how it is that lotteries make a lot of money, because people are all thinking that way. Like, I’d rather lose money on average, but know that I might win and change my life, and I think in the negative way, this can impact things like taking risks that involve the risk of death, right? What’s the risk of death worth to you like it’s everything, it’s everything you have, is your life, and so you could just… You could die by taking a risk, and yet we take risks all the time because there’s a benefit… Is it just that we’re looking for the average benefit? No, we aren’t… We aren’t calculating that. We’re just saying, I need this right now, I need to do this, or I want this. So I’m willing to take that risk.

0:56:35.9 DG: We get in the car, we go on an airplane, we walk out our front door, right? It’s… Becomes… It’s so… Banal. It’s like everyday risks. Right? So I think that there is some value in recognizing that some of what we’re trying to quantify, maybe less so in the financial world, but some of what we’re trying to quantify in terms of our behaviors, our lifestyle is actually not about what the actual uncertainties are in that way, but really like a human imposition of its value to us onto some assessment, and it’s why I like to think of risk, not as like, what is the risk of this… What’s the risk of that but rather comparative risks, I like to take a risk and think, “Oh, it’s… ” Compared to driving, which I do with regularity, then this is the risk. Compared to… What are those comparisons? And I think it’s why some of the things that are really controversial, like climate change and how much climate change is going to happen, and how it’s gonna… That have really… Like they do, there are error bars, but the cost is actually extremely hard to quantify, could be all of human existence or could be just some number.

0:57:48.9 DG: Billions of dollars. And some company… Some estimator put out there like, we don’t really know how to deal with that, right? And partly it’s that question of like, we don’t know what that expected values are, but then we also don’t know what is that imposition of human value onto it.

0:58:07.3 S1: Well, I’ll gladly take a B on the podcast just to get that, so thank you for making sure that we brought it up. That was awesome. It’s good. That was really good. Thank you so much I…

0:58:17.9 DG: I’m not giving a B.

0:58:20.2 MH: Okay.

0:58:22.2 DG: I think you guys are like super A plus like I had a lot of… This is a lot of fun.

0:58:24.7 MH: No we do have to start to wrap up. This has just been so great. And you’ve said so many amazing things. But a couple of things that really stood out to me were, when I speak about uncertainty, it doesn’t mean we don’t know, and that statistical error is not human error, and I think those are really just encouraging things for analysts and data scientists to really just remember sometimes so thank you for your perspective and the way that you think about this Professor Goldin, it’s been super amazing to have you on and just like… Yes. Very, very awesome. Okay.

0:59:02.7 TW: You look back around to Professor Goldin. I feel now you’re like you were… We were comfortable in Rebecca land for a while and then we got very formal.

0:59:09.4 DG: [laughter] Good.

0:59:09.8 MH: Well, we’re kind of like… I’m going back to sort of like, “Hey, this… Rebecca we’re friends, but you also represent a lot more.”

0:59:18.4 TW: Can I audit your class? [chuckle]

0:59:19.4 DG: No, it’s been great.

0:59:21.7 MH: We do have to start to wrap up, but one thing we do like to do is a last call. That’s anything that might be of interest to our audience that we could share and doesn’t have to be just one thing. Rebecca, you’re our guest. Do you have a last call or two or three you wanna share?

0:59:35.7 DG: Yeah. I have a few blogs that I recommend looking at. One I really like, that I’ve enjoyed recently is Freddy de Veer’s blog and he does some writing also about education statistics, I think are super interesting, but he is got just a huge amount of awesome commentary on how the world is turning. For some things that are a little bit lighter. I like math with bad drawings. This is a blog by Ben Orlin and he does a lot of super cute kind of cartoons and thinking about how math is done. And a third one I just mentioned is Math Babe’s blog, which is run by Cathy O’Neil.

1:00:11.2 TW: Oh, yeah.

1:00:12.0 DG: She does a lot of really cool kind of… Yeah. Perspective on how people think, but she is truly a mathematician by training and she’s gone into huge amounts of data science and has wonderful ideas that she shares on her blog.

1:00:26.8 MH: Awesome. Those all sound like excellent reads. Thank you for that.

1:00:31.0 TW: And I think her second book came out this… ‘Cause she did Weapons of Math Destruction.

1:00:35.0 MH: Yeah. Shame.

1:00:36.2 TW: And I think her… The Shame Machine just came out this…

1:00:38.1 MH: Oh, okay. Yeah.

1:00:39.8 DG: Shame is her second five digit themer [1:00:39.9] ____. Yeah.

1:00:43.3 MH: Oh wow. Okay. All right. Well Tim, what about you? What’s your last call?

1:00:45.4 TW: Well, I am gonna, I’m calling an audible and I’m gonna go with one that is a little topical and it’s gonna be a Cassie Kozyrkov, so deal with it people.

[laughter]

1:00:55.8 TW: Sorry. See we’re kind of fan people of Cassie, but she came out with, is a blog series, but it was 10 Differences Between Amateurs and Professional Analysts and each of the 10 links to a different post. Some of the differences linked to one post that covers two or three, but I think, Rebecca you said something that her data pro versus amateur difference number three and she calls it immunity to data science bias, which is a little bit of a misnomer, but the summary of it is that another big difference between an amateur and an expert analyst is that the expert has developed an all encompassing disrespect for data. They never pronounce data with a capital D because they know it’s dangerous, why it’s dangerous to put data on a pedestal, which I think you were saying earlier, something that I was like, “That’s from that Cassie thing.” But it is a list of 10 things that if it doesn’t make you think, then I don’t think you should be an analyst.

[laughter]

1:01:53.5 DG: That’s a perfect way to say it. Yeah.

1:01:56.8 TW: Cassie is so awesome, so.

1:01:57.1 MH: Awesome. All right.

1:02:00.9 TW: Kept it to one, Michael, are you proud of me?

1:02:01.0 MH: Yeah. That’s good. Very nice.

1:02:02.7 TW: Did I get an A? [laughter]

1:02:05.9 MH: That’s for Professor Goldin to decide, so, I don’t do the grading. No.

[laughter]

1:02:09.5 DG: No grading, no grading.

1:02:11.6 TW: No. It’s the…

1:02:12.4 DG: I refuse.

1:02:13.7 MH: All right Moe what about you? What’s your last call?

1:02:16.5 MK: So it’s really strange. We were at marketing analytics summit in Las Vegas a while back and I realized, I kept going through my last call list and I don’t think I’ve actually mentioned this before, which I find really strange, but the blog that I’m probably the most obsessed with at the moment is Eric Weber’s blog on data as products and I don’t think I’ve mentioned it. If I have mentioned it, people can tell me and then I’ll steal one of Rebecca’s because we’re all open for stealing and pilfering…

[laughter]

1:02:46.6 MK: But Eric Weber’s blog on sub stack data as products is fricking phenomenal. I’m getting a lot of really incredible information about that. Yeah, so.

1:02:57.2 TW: I don’t think you’ve actually mentioned it on the show we’ve definitely had…

1:03:00.3 MK: I don’t think I have. It’s really strange because it’s the blog that I read most regularly.

1:03:03.0 MH: Yeah. All right, Michael, what’s your last call? Well, sort of glad I asked myself.

[laughter]

1:03:09.4 MH: So I recently ran across this one of my partners here at Stack Analytics shared this, it’s from the company Evolytics, which is another analytics company in the space and they put together just a calendar timeline for analysts around COVID, which is really helpful because a lot of very confounding things happened with data during that period of time. Businesses grew dramatically, things changed in your company, lots of events were happening in the world.

1:03:37.2 MH: And so when you do sort of some things like year over year analysis or things like that, you’ll find all kinds of things. So having a calendar, you could overlay to explain potential externalities to what you’re seeing, may actually be helpful to your analysis. I just thought that was just a really nice resource that they created. I wanted to shout that out. They created, yeah, COVID-19 calendar for analysts side. Something might come in handy if you have to do some analysis for your… In your work.

1:04:08.4 DG: These are great. I feel like I need to go back to all of your podcasts and listen to the last few minutes to get all of those.

[laughter]

1:04:12.7 MH: Oh, yeah. [laughter] You can also find them on our website. Those… They’re there as well. So it might save you a couple minutes. But we do also love to hear from our audience so if you would like to reach out to us, feel free to do so. And the easiest way to do that is on the Measure Slack or on Twitter or on our LinkedIn page. And we’d love to hear from you and of course there’s much that you can get from this episode. So Professor Goldin, Rebecca, whatever, I guess we’re friends now so that’s pretty cool.

[laughter]

1:04:43.9 DG: Awesome.

1:04:45.7 MH: Thank you so much for coming on the show. It was such a delight to have you. Thank you.

1:04:49.6 DG: Thank you. This is been a lot of fun.

1:04:51.4 MH: This is great. Awesome. And of course, no show would be complete without a huge thank you to Josh Crowhurst, our producer. Thank you, Josh, for everything you do. And remember, even if your data is incomplete or not the best or you’re confounded by many different things, remember, the most important thing you can do is keep analyzing.

[music]

1:05:16.2 Announcer: Thanks for listening. Let’s keep the conversation going with your comments, suggestions and questions on Twitter at, at analytics hour on the web at analyticshour.io, our LinkedIn group and the Measured chat Slack group. Music for the podcast by Josh Crowhurst.

1:05:34.9 Charles Barkley: So smart guys want to fit in, so they made up a term called analytics. Analytics don’t work.

1:05:43.5 Tom Hammerschmidt: Analytics.

1:05:45.4 Tom Hammerschmidt: Oh my God. What the fuck does that even mean?

[music]

1:05:49.9 TW: Josh, Josh, don’t put that in the endings.

1:05:53.2 MK: Oh, why do they always say that when we’re recording?

[laughter]

1:06:00.6 TW: Rock flag and plus or minus three eagles.

Podcast: Download | Embed

Subscribe: RSS

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Have an Idea for an Upcoming Episode?

SUBMIT IT HERE

Recent Episodes

#274: Real Talk About Synthetic Data with Winston Li

June 24, 2025

https://media.blubrry.com/the_digital_analytics_power/traffic.libsyn.com/analyticshour/APH_-_Episode_274_-_Real_Talk_About_Synthetic_Data_with_Winston_Li.mp3Podcast: Download | EmbedSubscribe: RSSTweetShareShareEmail0 Shares