#269: The Ins and Outs of Outliers with Brett Kennedy

Published: Apr 15, 2025

Subscribe: RSS

Subscribe: Apple Podcasts | Google Podcasts | RSS

0 Shares

How is an outlier in the data like obscenity? A case could be made that they’re both the sort of thing where we know it when we see it, but that can be awfully tricky to perfectly define and detect. Visualize many data sets, and some of the data points are obvious outliers, but just as many (or more) fall in a gray area—especially if they’re sneaky inliers. z-score, MAD, modified z-score, interquartile range (IQR), time-series decomposition, smoothing, forecasting, and many other techniques are available to the analyst for detecting outliers. Depending on the data, though, the most appropriate method (or combination of methods) for identifying outliers can change! We sat down with Brett Kennedy, author of Outlier Detection in Python, to dig into the topic!

Articles and Other Resources Mentioned in the Show

Photo by Timothy Chan on Unsplash

Episode Transcript

0:00:05.8 Announcer: Welcome to the Analytics Power Hour. Analytics topics covered conversationally and sometimes with explicit language.

0:00:13.7 Tim Wilson: Hi everyone. Welcome to the Analytics Power Hour. This is episode number 269. I’m Tim Wilson from facts & feelings and Confession Time. I am mad about MAD. And by MAD, of course, the latter one. In that statement I mean the acronym MAD, which stands for Median Absolute Deviation. I came across that technique, I think, eight or nine years ago. Actually needed to go look in GitHub to figure out like my timestamp on that, when I needed to be able to detect likely Outliers in like hundreds of very small data sets, like just 10 or 10 or so data points each. It’s a long story. It’s not important. But Median Absolute Deviation is one of many, many techniques for detecting Outliers, and Outliers are the subject of today’s episode. With that in mind, we’re operating very much within one standard deviation of our mean host count and that I’m joined by two co-hosts for this episode. Julie Hoyer from Further do you have a secret favorite Outlier detection technique?

0:01:21.0 Julie Hoyer: No, I don’t. Maybe I will after today.

0:01:25.2 TW: Right down the middle. Okay. Yeah, it’s unhealthy. My MAD affection, but we’ll get into that.

0:01:34.8 Brett Kennedy: But very reasonable. It’s an excellent choice.

0:01:37.6 TW: Yes.

0:01:40.3 JH: Validation.

0:01:40.4 TW: Okay, now I’m validated. And we’re done.

0:01:43.1 BK: The whole show’s out.

0:01:47.0 TW: And Val Kroll from facts & feelings, do you try to just operate pretty close to the mean as a matter of course?

0:01:54.1 Valerie Kroll: Oh, yeah, you know me.

0:01:56.3 TW: Nope, there was definitely not a conversation about how cold water needs to be in order for it to be drinkable right before we started this recording. So, yeah, so we’re likely going to be chatting about some different Outlier detection techniques. Shout out to our listener survey last year that asked that we dip our toes into specific techniques and methods here and there. But we’re also going to get a little more philosophical perhaps about when and where and how Outlier and anomaly detection are useful. For that discussion, we needed a guest. The perfect guest would be someone who, like, devoted so much time and thought to the subject that they’d written an entire book about it. But that’d be crazy. I mean, such a person would surely be an Outlier themselves. Oh, wait, wait. What, What. What’s that? Oh, our producer is telling me we found such an Outlier. That was the voice you’ve already heard. Brett Kennedy is a freelance data scientist and he’s the author of a book, Outlier Detection in python, which is 17 chapters of Outlier goodness. And I’m not just saying that because Median Absolute Deviation. MAD gets introduced and described in chapter two. Today Brett is our guest, so welcome to the show.

0:03:11.3 BK: Well, thank you very much.

0:03:13.0 TW: All right, so maybe a good place to start is with how you came to write an entire book about Outliers. I think I had a good sense that there are multiple different techniques depending on the nature of the underlying data and kind of what the question at hand is. But what sparked you to think, hey, there’s a whole book’s worth of material here and I should be the one to write it?

0:03:36.6 BK: Yeah, well, yeah, there definitely is a whole book’s worth of material there. I mean, there are a lot of techniques that are used, can be used to identify Outliers, and they all have different nuances to them. There’s a lot of kind of gotchas with Outlier detection. How it started in my end is I was working for a company and we were building software that was performing Outlier detection. The software we were building was to be used by financial auditors. So the idea is someone who’s performing a financial audit on their clients. They would go through their client’s data, which include their bookkeeping records, as well as we had code in there where they can check text documents, contracts and invoices, meeting minutes and the like. We introduced Outlier detection for those as well, but we started with the bookkeeping data. And so what often happens when auditors perform a financial audit is they’ll get a set of transactions in the bookkeeping records or sales purchases, payroll records and the like. And there will often be millions or hundreds of millions or billions of these records. So it’s not really feasible to manually check them. So what can be done in that case is just to randomly check, spot check them.

0:04:57.8 BK: Or you can use Outlier detection to try and find the ones that are the most unusual in there, with the idea that those are at least among the most relevant records to check. Auditors would also want to check the ones that have large financial values and those other kind of criteria as well. But the ones that are the most unusual can be the most likely to suggest either errors or fraud, or working outside of the normal controls and situations like that. So when we were writing the software, it’s kind of a different experience when you’re writing software that’s going to be given to other users where they can be encountering just any sort of data, who knows what, and we won’t be there to be able to update the software as it’s needed or work them through the issues. So we had to make software very robust going out the door that can handle basically any situation and can provide meaningful explanations to what’s being found. So we were doing that. And at the same time we’re also working with regulators around the world. So the regulators are organizations each, most countries have them that regulate how financial auditors work, how they perform financial audits.

0:06:14.9 BK: And they were interested in looking at this kind of new wave of technology that companies like ours were creating and you know, things like Outlier detection. And they were raising a lot of important questions, like. Well, a couple in particular. One was when your software flags an Outlier for an auditor, how’s the auditor going to know why it’s unusual? It’ll just say, look, this sales transaction is statistically unusual. Okay, why?

0:06:45.8 BK: And also a point they were making was if the software generates flags a bunch of transactions as unusual, with a little bit of effort you can go through those and say, okay, these are or are not statistically unusual, These are or not interesting. They do suggest fraud or errors or circumventing controls or something. But how can we check the converse? How can we check all the 99 point something percent of the transactions that weren’t flagged? How do we know there wasn’t, There weren’t transactions in those that were even more unusual or more important to flag. And those were a couple really good questions. And I think that really got myself and the team thinking deeply about interpretability and testability and Outlier detection.

0:07:38.0 BK: And we got into some areas that I think there really weren’t other people doing much. I mean I went through a lot of academic papers, I think like 200 or something just to try and figure out what the state of the art, no, having, I should say Outlier detection literature is kind of a lot easier to read than a lot of other areas of computer science.

0:07:58.6 VK: Well, that’s interesting.

0:08:00.9 BK: Yeah, that was my experience anyways. Now maybe that was an artifact of once you’ve read the first hundred or something.

0:08:10.5 JH: The second hundred are a breeze.

0:08:12.0 BK: Yeah, they don’t seem as bad as, I mean they’re a little shorter and a little more manageable than is say deep learning or some, some other fields which aren’t awesome often manageable. But yeah, so I went through a lot of the literature and kind of came up with a good sense of where the state of the art was. But also we were doing some kind of original research in terms of making Outlier detection interpretable, explainable, and doing work around testing these things, which is actually a difficult problem. I mean, it’s solvable, but it is a difficult problem.

0:08:49.3 S1: Picture this. You’re stuck in the data. Slowly you’re wrestling with broken data pipelines, manual fixes suddenly streaking across the sky faster than a streaming table, more powerful than a SQL database, able to move massive data volumes in a single bound. It’s not a bird, it’s not a plane. It’s Fivetran. I need a hero for data integration. Fivetran, with over 700 pre built, fully managed connectors, seamlessly syncs your data from every source to any major destination. No heroics required on your part. That means no more data pipeline downtime, no more frantic calls to your engineers, no more waiting weeks to access critical insights. And it’s secure, it’s reliable, it’s incredibly easy to deploy. Fivetran is the tool you need for the sensitive and mission critical data your business depends on. So get ready to fly past those data bottlenecks and go learn more @fivetran.com/aph Unleash your data superpowers again, fivetran.com/aph Check it out.

0:10:00.1 TW: So it’s, I mean, to me kind of the nature of, and I had a question when I was kind of starting to scan through parts of the book as to what the difference is between an Outlier and an anomaly. And the book you said, I think that you, you kind of use them interchangeably. Which I was like, cool. This is an easier thing to read than I would have thought if there was some. So maybe that backs up your point. But it seems like the concept of an Outlier is one of those where if you know data deeply and you look at it, you’d say, oh, well, that’s clearly an Outlier. Like it’s one of those, it’s this kind of nebulous idea. And the question is, well, and the brain may be with certain types of data, be able to kind of look at it or sense it, or say, well that’s that temperature reading of the peak temperature was 150 degrees centigrade or Fahrenheit either one clearly is an Outlier. You know, that’s just logic. But is that like a fair way as you work through different techniques? It’s kind of like what would have a human logical interpretation of, well, that’s out of the norm in something we’d call an Outlier. But it’s not like it’s a definitionally precise line. I’m like, it kind of can’t be right.

0:11:27.5 BK: No. And it’s not. That’s one of the challenges of Outlier detection is there’s really no definition, definitive definition of an Outlier. And one of the consequences of that is there’s a lot of different ways to try and find Outliers. Like you mentioned, Median Absolute Deviation. That’s one I go through, I don’t know, like a couple dozen others. And there’s a reason there are so many Outlier detection techniques. It’s just there is no. If there were, we would just have one Outlier detection algorithm and that would be it.

0:12:00.3 BK: It would just say for sure these things are statistically unusual and these things are not. And what makes difficult as well is there’s kind of a difference between what’s statistically unusual and what’s useful, what’s interesting. So, like with the example I just gave, with going through financial data, you can find things that are statistically unusual, but there’s nothing noteworthy about them. They just don’t occur that often. I think, like in the book I gave the example of annual payments. Companies will have these payments to make once a year. So they’re unusual because there’s only one per year, but there’s nothing otherwise interesting about them.

0:12:39.3 VK: Can I actually just call out something about that quote? Cause I actually love that. That was one of the things that I pulled out that it’s about when you’re looking at those transactions, that some of those annual payments, because they are once a year, that they’re not problematic, but they’re understood by the analyst. And so I thought that that was really interesting because when I think of like Outliers and all this detection, I think of a lot of things that are happening not always with human touch or human touch considered first, like the analyst role in all of that. And so it just really struck me as like an interesting concept that it’s. It’s that there’s processes and different methodologies, but that it really does take the interpretation to really kind of even say whether or not it is an Outlier because there’s no definition. And I was like, that is. I had never thought about that before. I thought that was super interesting.

0:13:25.9 BK: There’s still a very important role for an analyst in all this, and financial Data is a little more straightforward than some. If you get into a lot of scientific data, and we get into other domains too, like get into like image data or video, audio and so on, it can really require an analyst to take a look at that to say, okay, this is something we’re interested in pursuing further. Or this is not. No, depending on the domain. Well, somewhere that might be even a little bit more straightforward than financial data would be. Where you have sensors to monitor industrial processes. I mean, that’s something that’s another application of Outlier detection. And if you have a certain assembly line or something like this, it could be well understood what the normal behavior of the system is. And really anything outside of that is probably worth looking at. But even there you can have certain things that are statistically unusual that are just known to not be of interest, but they can reoccur. So you might need an analyst to look at that and say, okay, if this specific combination of sensor readings over this time window occurs, this is known to be of interest or this is known to not be of interest.

0:14:48.7 BK: So the analyst may only have to look at it once. And going forward, the system may be able to say, okay, this is of high importance or you know, mid importance. So we’ll send out alerts, but we won’t shut down the system. Or it’s known to be a non issue that it can do.

0:15:04.5 JH: When you were rereading the credit card fraud detection, it reminded me one time when my fraud alert went off on my credit card because it was a purchase made in the suburbs that credit card company knew was like, that’s not right. She never go out to the suburbs. And I remember like, be like, no, that, that was me. It was an accident though. So be sure to flag this again in the future. Something has gone wrong.

0:15:29.7 BK: Yeah, yeah. I think a lot of us that are can remember too, like credit, the fraud detection in credit card is now, it’s not perfect, it’s a very hard problem, but it works so much better than it did say 20 or 30 years. Like it used to be like something like that. Even people that did go to the suburbs routinely, it would still like just shut them down, shut down the card. It’s like if it had seen it before. Yeah, so it’s a little bit smarter. They’re a lot smarter now. But yeah, it’s a very, very difficult problem because you know, there’s a lot of kind of subtle ways people can if they have a stolen card or something can go about committing credit card fraud.

0:16:12.0 VK: Yeah. How so going back a little bit to the methodologies, because you said you were able to write about a lot of them it’s not just like a handful, like, oh, three or five, like there’s over 10. It sounds like at least of methods. So when you, in past scenarios or in general, when you’re trying to decide which method to use, what are you using to help you make that choice? Are you looking at attributes of the data? Does it have more to do with like the business context of the data? Or is it even. I’m thinking, like, is it the use of the actual Outlier once it’s detected? So are. Is an Outlier bad and you want to remove it? Is it good and you want to understand why it happened? Is it or it’s like, is there a big issue with like a false positive of an Outlier? Like, how sensitive is the scenario to that? Like, I’m just wondering, like, what goes into helping you consider and choose your method?

0:17:16.6 BK: Right, right. Yeah, well, just to take the very end, just quickly is, yeah, there’s definitely a concept of false positive and false negatives. You know, type 1, type 2 errors without liar detection. You can over flag things or under flag them. And interpretability is very, very important. It’s usually much more important than in other areas of machine learning. Like if you’re creating a predictive model or say a generative model, where it’s just trying to, if you have a model you want to say produce a picture or produce a sound clip, you don’t necess… I mean, if you’re debugging it because you’re the person developing it, you would need to know why it did what it did. But if you’re a user, you don’t really need to know why it generated what it did or if it’s making a prediction about something you don’t. In some contexts you do and some you don’t, but you often don’t need to know exactly why. But it’s very, very common in Outlier detection to need to know why it flagged something, it gave something, a higher, lower scores and something else a lower score.

0:18:15.7 BK: And that can really come into your choice for which detectors you use, partially because unfortunately, very few of them are Outlier, are interpretable. They tend to be black boxes, which, and you know, you can imagine, like, if you’re using Outlier detection for, like, if you’re investigating fraud or Security or you’re looking for criminal activity or something like this, or even if it’s just you get an Outlier detection has flagged something says, oh, your industrial system is behaving unusually, you should shut it down. Well, you need to know why so you can investigate that quickly. And if there’s a security threat or a safety threat or something like that, you want to be able to investigate it quickly and efficiently. And unfortunately, most Outlier detectors don’t let you do that. So that can certainly affect your choice. Some are a lot faster than others or slower. And some do detect different types of Outliers. Like Median Absolute Deviation, for example, is an excellent detector. But it’s intended to detect extreme values in numeric data, which means it won’t detect Outliers in categorical data or date or text data. It won’t find internal Outliers, which means values that are kind of in the middle but unusual. They’re not extremely large, extremely small.

0:19:36.3 TW: Well, so I mean I think part of, I think back to early in my analyst career once I was trying to get the team to not look at every. And this was with digital marketing data. So it was time series data and I didn’t want like it was never going to move in a perfect line. And I was struggling with getting told every single week or every month yes, the number was going to go up or down and trying to figure out had it gone up or down enough.

0:20:09.4 BK: An unusual amount.

0:20:10.6 TW: Yeah, an unusual amount. And I do remember at the time, like it’s like I was trying to like rediscover statistics or discover statistics from the beginning. Like it was a well intentioned but horribly executed exercise where I took, I would take the data and I would calculate the mean and then I’d do like little plus or minus one. I think I did like a best fit line, like a regression and then I put plus or minus on either side of that line. A standard deviation. Horrible because the data was always trending. But I was… So it was time series non stationary data and I was calculating the mean which was kind of dumb because if.

0:20:54.8 BK: It’s moving, it’s not a constant mean. Yeah, yeah.

0:20:57.9 TW: So I mean that was one where. Yeah, yeah, I mean it was definitely trying to be. It’s sort of the, I mean it served the purpose in that it gave me the ability to put like a band around it. It was horrible. It just gave something and people sort of believe that oh, it needs to go outside this range. But I think that’s when I discovered like I was trying to use Z score with like 10 data points and it’s like well, well it’s going to be like that wasn’t. That’s where I wound up finding MAD because that was one that it, it’s generally a little bit better when you say I have a small. Yeah, so I guess.

0:21:39.2 BK: Yeah, yeah. So yeah, no, that, I mean actually I think a good piece of Julie’s question as well is some detectors are just better suited when you have large amounts, very, very large amounts of data. They’re just very efficient and some are not and some handle smaller data sets as well.

0:21:55.6 TW: And I think a lot of the, like talking about all the patterns, like when you do talk about time series that you have data moving along and then there’s like a step function or a rupture and it changes like you call out that’s different from like a spike or an extreme DIP and it’s different if the data is kind of trending. So it is. Seems like you need to understand the nature of the data that you’re looking for Outliers in and that narrows down the techniques that might be appropriate. Is that…

0:22:26.9 BK: Yeah, yeah. If you’re looking at yet within time series data, I mean, I mean part of your, I guess the answer to your question, Julie, is it depends on the type of data like you were suggesting. So if you’re dealing with image data or video or audio data, there’s certain tools that are appropriate. Basically, if you’re dealing with that type of data, it’s got to be a deep learning based technique that’s likely the only thing that’s going to work. And with text you might want to use deep learning or a base method or a simpler method. With tabular data there’s certain techniques and with time series data there are certain techniques and it comes down a bit to some of these issues like performance and you know what’s going to tend to balance your false positive and false negatives. Well, but yeah, it’s also like what type of Outliers you’re interested in because maybe if it spikes a lot, that’s not interesting. It just you know, once in a while to do that. And yes, it’s statistically unusual, but it’s not something you would investigate. But maybe in other scenarios if you get a spike you don’t want to look at that.

0:23:32.2 BK: But if you get a whole bunch of spikes within a short time period, you might want to look at that or sometimes you can actually what the opposite could be interesting if. If it’s unusually flat over a period of time or unusually smooth, straight line, which might suggest. It depends on the situation. Could suggest, like just a sensor failure. Like, it’s not particularly sensitive, so it’s just giving a flat line. Like a flat. Yep.

0:24:01.2 JH: I was gonna say. Is that. Is that an example of what you were saying of an Outlier hiding in the middle? Like, it wasn’t an extreme high or extreme low. But contextually, if we’re monitoring a sensor, it would be unusual that it was flat and in the middle for a while. Is that a good example of that one? Because I was like, whoa, an Outlier in the middle? Like they’re everywhere…

0:24:23.7 BK: Yeah. Yes. That’s one of the things about layer detection is. Yeah. Depending how you look at your data, you can… If you look at… And again, that’s why there’s so many Outlier detectors, because if you just look at your data one way, you’ll miss other ways in which it’s unusual. Well, what I was getting at before with inlier detections. I give maybe a simpler example of that is, say you just have a list of sales records for a company, and generally most of their sales are around $10. Or you see sell some items that are $10, and you also sell items that are $100. So a sale for $60 is kind of in the middle. It’s kind of weird. So it’d be an internal Outlier. So, yeah, you can get. Where you have kind of bimodal or multimodal distributions, you can get things. You can get values in just a numeric series that are unusual, even though they’re not extremely large or extremely small. And it gets something that’s kind of a similar issue with time series where it’s. Yes, I guess the same thing where over time, maybe your temperature, if you have a.

0:25:40.5 BK: Or say you have a sensor that reads sound volume, and maybe the volume tends to be either very low, like zero, meaning machinery is off, or it’s high machinery is on. If it gives you a reading in kind of in the middle, kind of halfway in the middle. Well, that might. If that occurs rarely, then that would be unusual. But it’s not an extreme value. It’s the opposite. It’s an internal Outlier. It’s called. Yeah. Okay.

0:26:11.5 JH: So we’ve been talking about a lot of these Outliers that are indicators of something going wrong. But I also want to talk because this is something that it makes sense. But again, seeing it framed the way you put it in the book was helpful about that. It’s not always necessarily looking for problems like I think about. Financial markets are looking for dislocations so that people can move money and take advantage of that. Or like there is an audience that’s being underserved if it’s a marketing example. And so now we can think about offerings for them. So I guess just curious about, like the balance of those concepts, ideas, and how they potentially work differently with the type of approach that you think about for identifying them and the processes that lead up to it and then happen because of it.

0:26:55.9 BK: Yeah, that’s one of the key points about early detection is. Yeah, anything that’s unusual is not necessarily a problem, depending on the context. I mean, again, if you look at like an industrial system, then probably anything that’s unusual. Odds are it is a problem because the normal functioning is also the optimal function. There’s no… Machinery, can’t spontaneously behave better than normal. But a financial market or. Yeah, marketing. Yeah, you can get. Yeah, well, in science, I mean, a lot of what’s done is just looking for the anomalies because they’re what’s scientifically interesting. So if you have a collection of records about instances of certain animal species, if you find something that are. Or plant species or something like that, if you find instances that are anomalous and it’s correct, it’s not a data artifact, then that kind of can expand your sense of what’s possible or maybe trigger some research trying to figure out, well, how did that come about? It’s used a lot in astronomy, for example, as well, because that’s one of the fields of science where we’re just accumulating monstrous quantities of data.

0:28:09.3 BK: And it’s just well beyond possible for a person to go through all the images every day. And even if he could, there’s just so much in some of the images coming back from astronomy. So if we’re looking for a kind of new phenomenon or phenomenon that maybe we have seen before, but only rarely, then just running Outlier detection on the images coming back from the telescopes can be a very effective way to find something. Ideally new, but. Yeah, at least rare.

0:28:45.5 JH: Do you have a favorite coming back a little bit to the time series thing? Because I think that’s what we work with, at least.

0:28:53.0 BK: Okay. Yeah.

0:28:55.0 JH: For time series data, which is mostly numerical, like you said. What are some of your favorite methods? Maybe based on like interpretability. I just feel like when we have run into Outlier detection that’s maybe like built into certain analytics tools that we work with, there is no interpretability. So you keep talking about like, oh, based on interpretability. So I’m interested. Which one’s out there help us do that? Which ones have the best. What are your favorites?

0:29:24.5 TW: Can I?

0:29:26.0 BK: Sure.

0:29:26.7 TW: Can I throw in? Just because Julie and I both have worked with like a Bayesian structural time series black box is to me the magic of exactly how all that works. But conceptually, having a forecasting technique, having a forecast forecasting engine of some sort that then says based on the data up to this point, we would expect the future data points to be X or Y and if they’re outside of that range, like that I actually feel like is interpretable. Not that somebody needs to understand that this is how this winners forecasting or something worked, but I’ll just throw that in that is background that at least. Actually, I think Val’s sort of seen us mess around with that too. But like that the forecasting is one mechanism of Outlier detection. But what else is there?

0:30:23.6 BK: Oh, well just as Julie, you were asking that. In my mind, I was thinking the forecasting based ways. I probably might generally…

0:30:31.6 JH: Oh look at that.

0:30:35.4 BK: Now some of the other methods are simpler than forecasting based and therefore might be a little bit more interpretable. But I think I agree with you, Tim, that if you make a forecast and you say we predicted on Thursday that we would have $2,000 in sales and we actually had $16,000 or so, that’s well outside of what was predicted. And you can see lots clearly an Outlier. It’s not completely interpretable necessarily because, well, why did it predict 2000? Now it could be that if you look at your history and you see, well, here’s the general trend and here’s the seasonal patterns and yeah, the other information took into consideration. It could be that a prediction of 2000 is pretty straightforward. So you can see why I think in general time series data lends itself to interpretability a little better than a lot of other types of data, just because it lends itself to plotting better than other types of data. But I mean with looking for Outliers and time series data, you can go the simplest extreme method is just looking for extreme values.

0:31:44.9 BK: Just looking you’re basically just ignoring the sequential nature of the data. You’re just ignoring the time component entirely. And Just looking at, for very, very large, very, very small values. That’s the simplest. And if you plot that, it’s really easy to see why something is extreme. Like if you get a.

0:32:05.6 TW: But if you don’t plot it and you actually did have. If the data was, was like honest on a trending upward, then it would tell you that, oh, your Outliers were at the start of the period and the end of the period because they were the lowest and the highest. Right. And that’s.

0:32:23.5 BK: Yeah. If you could. Yeah, you do risk that. And that method can be a little bit overly simplistic. That’s why I am kind of leaning. I do kind of like yourself, lean towards the forecast method for Outlier detection.

0:32:37.0 TW: But is that one where like and we had a recent episode about linear regression. We did not talk about like AROMA, but… Okay, is ARIMA like the base level sort of forecasting that that’s like. Or does ARIMA fit? I’m trying to bucket them in my head.

0:32:57.4 BK: Yeah, I think, I mean, when you think really talk about Outlier detection with time series data using a forecasting method, I mean, you can use any forecasting method. So ARIMA methods would be common or exponential smoothing, but you can use recurrent neural nets or really anything that’s able to project into the future. Even standard machine learning techniques like random forests and XGBoost and CatBoost and so on can be used for time series prediction. That they have some limitations in that if you have the situation you just described where you have like a strong upward trend and the future values are going to be generally a little bit larger than anything that’s occurred in the past. Those types of models can struggle to make good predictions in that situation. But if you have a time series that’s fairly stable over time, but there’s not an upward or downward trend, they can work well. And in that situation, just looking for extreme values can be one way to find a certain type of Outlier. Now we’ll miss other Outliers, like more contextual Outliers, where you have a spike or something, where you have a value that’s like say you’re looking at daily sales figures or something like that.

0:34:12.9 BK: You might have a value that’s normal, but not for that time of year or that day of the week or something like that. It’s just different or it’s just different than the values that have occurred recently. So it’s still an Outlier in that sense, but it’s not an Outlier in the sense of being unusual over the last year or something. Yeah.

0:34:31.2 VK: And what’s interesting too, the technique that Tim brought up, I was just thinking like the context behind that or what makes it interpretable is that you’re choosing your pre and post period based on like what’s changing in the environment. So you have that knowledge going in, assuming that. Everything else is held constant enough over the period that you’re isolating the mechanism that would cause an Outlier. You’re hoping in whichever positive direction and improvement. So it’s interesting though, like the interpretability comes from that, again, context of the situation and you choosing these parameters to then apply it.

0:35:11.1 TW: But if you were monitoring something, why wouldn’t you say that? I’m going to keep rerunning it and it’s always going to use the data up through yesterday or through a week ago. And then like, I don’t have a planned intervention. I’m just going to. I guess I have to have an assumption that there are any Outliers that occurred before last Saturday aren’t going to muck the model up so much…

0:35:33.8 BK: Which they can…

0:35:35.1 TW: Detect.

0:35:35.7 BK: Yeah.

0:35:36.3 TW: So that’s, that’s the risk. That’s the issue.

0:35:38.0 BK: Yeah, there is a risk. Yeah. I mean, that’s actually one purpose for you doing Outlier detection on time series data is even if you’re not interested in delving into those Outliers, just removing them because they can. Outliers in your time series will affect your sense of what’s normal, which can preclude your ability to really find Outliers.

0:36:03.1 TW: If you remove too many Outliers and then use that to build your forecast, then.

0:36:07.3 BK: Yeah, yeah. It’s a balance you’re going to keep running.

0:36:10.2 TW: Oh, yeah. Wow.

0:36:11.4 BK: Yeah. But you can imagine, like say you train your model on one day’s worth of data in order to see if there’s any Outliers the next day. But the day you train it on is Christmas or Black Friday or something. That’s an anomalous day or it’s just a day when your computer system went down for a bit. So if you use that, it’s a really good point you make because this is one of the tricks of Outlier detection is what do you use to define normal or. And sometimes what’s normal can be different than what’s Desirable. So now you’re saying, like in an industrial system, often they’re fairly close to the same thing. But if you’re looking at marketing data or something like this. Yeah. There can be distinction. And without liar detection, it’s always compared to what. So this can be. So if you have a day where, I don’t know, like say you’re monitoring, you say you have a web site you’re monitoring and you’re making sales on the website, maybe getting a lot of sales that day, and you’re looking. Was that anomalous? Well, compared to what? So you can look at the last day or the last week or the last month, the last year, and all these can be kind of useful to compare to, but also misleading to compare to.

0:37:36.8 BK: Right. So I think my tendency, for what it’s worth, is to have kind of multiple frames of reference for when you’re looking at Outliers. So say, okay, the sales we’ve had the last hour, that’s unusual compared to yesterday, but it’s normal compared to say it’s Tuesday, last Tuesday, or it’s unusual compared to this day of the year, last year. And you say, okay, well, then you have a little bit, you have a little bit more context to it. Gives you kind of sense. Like if it’s unusual in all those senses, then you say, okay, this is more unusual.

0:38:15.0 TW: It does. And this kind of goes back to where you were starting with trying to build sort of a generally repurposable Outlier detection into kind of the financial world. In the digital marketing world, every platform has at least once, if not twice, come out with a hey, here’s a way to just like intelligent alerts like, which is basically like, we’re gonna do anomaly detection at scale on the behavior on your website.

0:38:50.5 BK: Sure.

0:38:51.1 TW: And those have like comically failed, like the naivete coming out of the product because they find things like, oh sales have tanked in California and it in this part of California and it’s right when there are wildfire where everything shut down.

0:39:09.0 BK: Yeah.

0:39:09.5 TW: Or they generally really find things that are just that really are kind of legitimate noise because it’s like trying to figure out that threshold. Or I mean there’s Adobe analytics had this where you could set your threshold for what you wanted to monitor. And it was basically giving you the technique was something that you’d set a 95 or 99% threshold. And the idea was this was gonna just kind of magically tell the analyst like where to go look for stuff. But it always the calibration of not having it say it’s identifying Outliers every day, in which case you’re overwhelmed and can’t track them down. Or it’s like so rarely to finding Outliers that there’s some suspicion that stuff is being missed. Like as you’re describing it, it seems like it’s so much more about. You have to kind of think through the… Call it, the process. It doesn’t have to be manufacturing or sale. I mean, it can be. What are these things that you’re looking in? What is the nature of that underlying data? What is it you can impact? What do you truly care about and what do you, what can you impact? And then what is the right technique and what’s the right way to do it?

0:40:38.7 TW: I don’t know maybe I’m mounting my ramp. So there were just times when each one of those rolled out. They’d say the analyst job just got easier. Which I actually think AI, there’s some thinking of that too, that point the AI at the data and it will tell you, give you insights. And it’s like, well, it’s gonna find Outliers, but without the context or…

0:41:02.9 BK: Context or the intelligently. Yeah, I think, I think what you’re just saying is kind of bringing me back to your first question is why did I want to write the book on this? And partly because that is such a difficult problem. I mean, I don’t think it’s unsolvable, but it is a very difficult problem. And that’s… We were kind of in that situation at the time too, where you’re trying to make something that’s gen… Like if you’re making software for your own company and you’re writing it and you’re using and it’s a simpler process because you can just constantly tweak it, tweak it, tweak it until eventually it gets it gets better over time. When you’re making software and having to send it out, you kind of have to go through all that process ahead of time. You got to do a lot of kind of thinking what can go wrong here? And it’s exactly what you’re saying. A lot of over reporting and a lot of underreporting, a lot of where it’s not underreporting, but you don’t really know that because you can’t.

0:42:00.2 BK: It’s hard to Determine. Yeah, these are tricky problems. And this is… So what I was finding too, is just going through the literature on Outlier detection is almost all of it was just talking about how Outlier detectors work. And there’s a little bit of, a lot of discussion, I guess, as well, around how to combine multiple Outlier detectors, how to set thresholds on the scores that they’re producing, and things like that. So these sort of technical issues that are very, very important, and I cover those in the book too, they are important. But what’s always being left out of the discussion is just what you’re talking about now. How do you actually maintain an Outlier detection system that gives you the Outliers you want without giving a whole ton of other ones? And I think fortunately, realistically, when you first set up a system, it’s probably going to. It’s just going to flag anything that’s statistically unusual for whatever reason. You need a way to kind of tune it over time. And there are ways to do that. And some of these things aren’t necessarily bad. Like, I mean, when there’s wildfires in California, probably certain things were down and they were legitimately Outliers.

0:43:12.5 BK: It’d be kind of weird if it didn’t flag that. But the point you’re making is it just generates all kinds of noise, right? And it just gives you things that are unusual in ways that are not pertinent to you. And there is a little bit of the process involved with just. Well, a lot of it comes down to categorizing the Outliers that it’s producing. So he says, okay, if it finds this type of thing, I know I’m interested, and if it finds this sort of thing, I know I’m not interested. And if it finds anything else, then I haven’t seen that before. I want to investigate and determine that. And then over time, you can kind of tune the system to find what you want more than what you don’t want. And part of that’s like, comes down to a question you asked before, is like just choosing the detectors you used. You may, as you get going, realize, well, I’m not actually that interested in extreme values. Some context you are, but maybe in some contexts, or maybe you specifically are interested in extreme values, but it’s flagging a lot of things that are unusual combinations of values which I’m not interested in.

0:44:20.5 BK: So you might want to downplay those Outliers unless they find something super extreme, bother me just tell me about the. Or maybe you’re not interested in very small values, only very large values. So you can do some filtering on the output of the Outlier detection. But it’s also a lot of comes down to tuning it, tweaking it to give you to put the emphasis on the sort of things that you are most interested in. And a lot of things, what happens to with projects is people start over time, they change their mind partly because if something was interesting, it legitimately was interesting, but just no longer is. At this point it’s understood. And so you see, okay, I’m glad it flagged it for a while, but not in the future or it starts flagging things that, geez, I never thought of that. It’s the unknown unknowns that can really be interesting and important. So you don’t want to dampen down what it’s producing too much because sometimes you know, those are the most important ones even though they can be Outliers among the Outliers that are flagged. So…

0:45:35.3 VK: Levels.

0:45:35.4 TW: How much of an Outlier?

0:45:36.0 BK: Yeah, yeah, yeah. See you can actually, and I’m not being facetious, you can actually run Outlier detection on the results of your Outlier detection. And that can be quite legit because say you have a case where you’re just analyzing millions of documents or millions of table rows or something like that. If say you have a billion transactions that come through because you’re a credit card company or something, even if you’re only flagging 100th of a percent of the transactions, that’s still a lot, which means it’s still beyond probably the ability of a person to go through them. So you really, I mean you probably at first need someone to go through just to see what you’re dealing with. But yeah, over time you want to kind of reduce that workload. So it’s just kind of flagging the same thing over and over and over again because it truly is statistically unusual. It only occurs once every 10 billion rows. But by the time you’ve gone through a trillion rows, it’s like over and over and over.

0:46:41.7 VK: So Moh isn’t here, but I have a scenario I have to one of our other co-hosts. So this was this unearthed. Honestly preparing for the show under so many memories unlocked for me. But I used to be in market research and we used to do different types of like I was called a conjoint study or conjoint analysis. And basically it was like understanding which like features and attributes and combinations of those things were most attractive, like, at what price points? So there was a study that was commissioned by US Cellular RIP. They didn’t go out of business because of the study, I swear. But we did this study for them to understand, like, what features should they consider and what combination to make for the next smartphone and what price point should they potentially sell them at. And we discovered through a couple of those analyses that there was this group of people who were not sensitive, that there was just no combination that would. That would make this an attractive offering. Turns out through some secondary analysis that they were Apple loyal. And this is like 10 years ago before people were like, as divided as they are today.

0:47:42.0 BK: Okay.

0:47:42.4 VK: But we decided that we were going to remove those Outliers air quotes. So I don’t know if this was A, the right thing to do or B, the right labels for it. So I’m just interested in your reaction. We remove them, but in future times we ran this study, we would ask like, do you currently have an. An iPhone and how much do you like it? And if they did and they liked it, we would just remove that. We didn’t even thank you for participating. Here’s your $10 Starbucks gift card. No longer interested in your. Because they didn’t behave like everyone else who was sensitive to the addition or subtraction of features at different price points. But I’m just curious about your take. Like, first of all, were those actually Outliers? I honestly don’t know how you would define them. And like just the application of that in the context of that business purpose.

0:48:23.7 BK: Yeah, I mean, they might be Outliers or not just depending on. Again, there’s no real great definition of that, but if they’re rare in your data, then no. Are they rare in society or where they have to. And if you go back to the 1970s or something, yeah, that was probably pretty rare.

0:48:43.0 TW: Val has never owned an Apple product in her life, so she considers. Considers Apple as Outlier. She’s like, yeah, freaks and we’re like, okay no, it’s an indictable part of the population.

0:48:53.8 BK: Well, as. I think there’s this kind of a statistical sense of normal and there’s the Platonic sense of normal. So you can argue that there. You can argue that no matter how statistically normal they become, they’re still. You can argue that wherever you wish. You could argue that. I guess. Yeah. I mean, yeah. When you’re doing that kind of research. Well, another example that comes up in my line of work is with creating predictive models, which is the same sort of ideas. You train your model on a certain type of data and then once you put it into production, if it’s encountering data that’s unlike the data that it was trained on, it’s not going to know how to behave and it’s just going to basically be behaving kind of half randomly. So it could be in a situation like you’re describing, you can do market research on a certain cohort of the population that has certain properties. And then, then if you try and market, create a marketing campaign to. People are just different than the people that you looked at. They’re just their ages are unusual or their country they live in or language they speak in any way that might be relevant.

0:50:08.4 BK: If it’s different than the group that you did your market research on, it’s going to be hard to extrapolate your findings. You know, that you’re extrapolating your findings is going to be a little bit suspect. Yeah. So it can be a useful exercise to do…

0:50:24.5 VK: So were we wrong when we called them Outliers? Again, context, Tim. This was like 15 years ago. I was a baby analyst, so. Okay, well, it’s prevalent.

0:50:37.8 BK: I’ll go with a statistical definition and say that depends on how many of them there were. So it sounds like there weren’t too many. So I’m going to say if that’s true, then yes, they were statistical Outliers in that sense. But one of the things that’s important too is like everybody is statistically unusual in some sense. Like if you look at us in enough different ways, we’re all odd. Well, there’s. If you ever heard of what’s called the myth of the average. Yeah. So. And I think if you had. Okay, so did you hear about it through the studies? The US oh, move this. Okay. Studies, they’re looking at fitting cockpits to their pilots. And so they, they looked at the average leg length, the average arm length, the average torso width. And I forget how many dimensions they measured everybody in. But what they found was their pilots were all generally average in most of these dimensions. But there was no single pilot in the entire US Air Force that was normal in every single dimension. And that’s just looking at our physical dimensions. If you start introducing like hundreds of ways to look at people like do they like Apple versus Whatever, Android or whatever.

0:52:07.9 TW: Yeah.

0:52:09.5 JH: The group that shall not be named.

0:52:13.6 BK: Yeah. There could be people that are just die hard fans of some like, I’m sure. There’s still BlackBerry fans out there. Actually, I’ll be 100% there’s still. Because, I mean, there’s one there’s nothing wrong with it and two, there’s very few people buy them. So. Yeah, so there’s definitely the people you’re looking at doing market research you can always define as being unusual with respect to the product you’re trying to sell in some.

0:52:44.9 TW: I mean, look at car manufacturers, right? The whole invisible women, all women were Outliers removed from the data set for the crash test. On that front, I think, like, well, that was probably treating them as Outliers to be removed might not have been the way to go.

0:53:05.0 BK: Yes. Yeah.

0:53:06.0 TW: Well, on that note, we. We didn’t get into Z Score or modified Z Score or so many other techniques that. That maybe would be a little actual tough to get too, too deep into. So I will, before we move into last calls, I will plug the Outlier detection in Python book again. It is very readable, has both the code examples and the explanations and visualizing of the data sets. So if you’re interested in more on this topic, then cannot recommend that book highly enough. But before we go, we like to do a last call and go around, have everyone share a link or a thought or a post or a book or a movie or a Outlier, something that stuck out to them as being interesting. And Brett, you’re our guest. Would you like to go first?

0:54:00.9 BK: Sure, yeah. I mean, one of my interests as well as Outlier detection is interpretability. So I think I talked a bit about how interpretability is important for Outlier detection. There’s not nearly enough research in that field. But yeah, I’m interested in interpretability in general and one of the. And explainability as well. And one of the techniques for that is SHAP, which relates to feature importances. So one of the…

0:54:28.8 TW: Julia just perked up.

0:54:30.0 BK: Yep.

0:54:30.6 JH: We just talked about this at work. It was very timely that you said that.

0:54:33.8 BK: Oh, okay. Yeah, I’ve been interested in SHAP values for many years and permutation importances as well. I do read a lot of articles on medium, and one that caught my eye a little while ago is by a writer named, I think it’s Samuel Mazzanti, I believe it’s called. Your features are important doesn’t mean they’re good. And I thought it was a really nice article that kind of explained how SHAP fellows tell you what features your model is using, but they’re not telling you what features it should be using. So the features you’re using are contributing to your model predictions, but they’re also contributing some of them to your model error. There’s actually a set of articles on Medium talking about that sort of thing and talking about how you can use SHAP for feature selection and combining it with permutation importance is get a better sense of what your model is doing and why.

0:55:40.8 TW: Nice. Julie, can you share what the nature of the discussion you were having about shop values was?

0:55:48.1 JH: Yeah, we. One of my co workers, he actually got asked by a client, Their data science team ran a model and they got all this output with a bunch of SHAP charts. And it comes from SHapley, right? And Tim, I thought you and I and Ben had a whole conversation about SHapley charts, but that’s what this SHAP was. And he had a ton of the charts and it was kind of crazy. It was like he was then asked to make the model output in this 25 page readout, technical readout, more interpretable for the data scientists to take to their stakeholders. So we were kind of talking about like, well, what matters here. And we were getting down into like, per location that the model was trying to predict locations for customers. And it was interesting because you had a SHAP chart for each outcome of the model. And so then we got into the discussion of like, all these features look different per location outcome. But I think what’s most important is like, you just want to know, can you trust of these features available, do you trust your model? Is it accurate enough to then use it to predict these outcomes and then you can go market based on these predictions? You know, so it was kind of like, let’s take a step back. But we got like really deep into the SHAP before we got there.

0:57:04.0 BK: You can go extremely deep into. There’s a lot of nuances with them.

0:57:10.1 JH: Yeah.

0:57:10.7 TW: Wow.

0:57:11.4 JH: And the big question that you said, like, do you have the right features listed? Like, you can have a bunch of crap features and have a SHAP like graph for it, but it’s like, it doesn’t really matter if they’re crappy features. But it won’t tell you that.

0:57:22.8 BK: No, it won’t tell you that. No, no, it won’t. But yeah, no, that’s the thing. I think they’re very, very useful charts, but I think sometimes they’re made to be somebody that they’re not.

0:57:35.6 JH: Yeah, definitely.

0:57:37.3 TW: Yeah. Wow. Well, Julie, do you have a last call to top your.

0:57:42.1 JH: I do, but I’m glad I got to comment on SHAP, because my last call is not anything close to that.

0:57:48.6 TW: Start with an SH too.

0:57:50.7 JH: No, actually a CH. So my last call is just something that has brought me joy, and it has an AI feature in it that now I’m thinking maybe uses Outlier detection. So this app is chat books. It’s just an easy way, as a busy mom parent, to, like, keep some memory books printed and, like, coming monthly. And it has a feature, though, that I really like using, which is, like, select my best 30 photos from the previous month to, like, get my book started, and then I can edit from there. And I always wonder, like, how does it choose? Some months, it’s like, spot on. Some months it’s like, giving me not screenshots, but, like, random photos that I’m like, I don’t need this printed.

0:58:31.6 BK: So that’d be a good question. Would they be the most typical for the month or would they be the most atypical?

0:58:37.7 JH: Ooh, right, and that’s what I’m wondering. I’m like, is it. It’s gotta be based off the mix of photos I had for the month. And maybe some months, it’s very clear that I was, like, at an event and it’s choosing, like, the clearest photo from, like, a run of similar photos or something.

0:58:52.8 BK: Yeah, that’s. If I wrote it, that’s what I’d be tempted to do. But is that really the best?

0:59:00.8 JH: So the technology part of it had me wondering, because I’m just interested, like, how are they doing it? And seeing as a user, like, do I find this feature actually useful or not? But in general, like, I’ve just really enjoyed that app and getting the photo books and it’s fun.

0:59:17.0 BK: Good.

0:59:17.7 TW: Nice.

0:59:18.3 JH: I like it.

0:59:19.3 TW: Val, what’s your last call?

0:59:21.8 VK: We might have to edit this because there’s a word that I can’t say in the title and the subtitle. Okay, ready? And I know the word. I just can’t pronounce it. Our human habit of anthropomorphizing everything.

0:59:38.0 TW: Yeah, anthropomorphizing.

0:59:38.6 VK: Should we be anthropomorphizing? Anthropomorphizing. I can’t say that word. I practiced this five minutes straight before I had Google say it out loud to me, before I joined this recording. And my mouth can’t make the motion.

0:59:54.8 BK: Anyways, not an issue, I 100% know what you mean.

0:59:57.4 VK: Okay, so we all know what I’m talking about. So this was also a Medium article published by Doc, written by Daley Wilhelm. And I thought it would be interesting because, like we all know about the should we be nice to generative. Generative AI or like the interfaceless AI saying like, thank you or please. But as I was reading this, I saw so many more examples that I hadn’t even thought of, which just makes it a really interesting read. So even saying like, the AI listens or learns is doing that versus saying the AI is designed to, or it’s built to. If you say that it sees this or it looks for this, it’s actually detecting or finding patterns. And I’m like, it’s so, like, it runs so deep. But anyways, it talks about some of the problems with doing that, that it actually changes the way people expect what the output is in terms of like, accuracy or the role that it can play. So anyways, it’s just kind of like a thought piece. But I really enjoyed that. And so hopefully you all can practice saying that word with me. Anthropomorphizing.

1:01:00.3 BK: Does it talk about the different ones that you’ll show? They’ll say, thinking like, while they’re like, that’s the waiting is a. Yeah.

1:01:07.8 TW: Yeah. I think they. They encourage the us thinking of them as kind of human terms because just how they communicate with us. They do say, like, I’m thinking about this or I’m looking at that or yeah.

1:01:21.0 VK: And I’m someone who anthropomorphizes everything in like, I’ve done this for years. Like, I remember trying to explain to my roommate why our DVDs needed to be in alphabetical order in the case. And she’s like, why? And I was like, because that’s how they like to be. And she’s like, you. I need some sort of exotic pharmaceuticals to chill out because DVDs can be in every order you want anyways. But it’s an interesting read. All right, Tim, how about you?

1:01:52.5 TW: Well, I guess I’m going to go with a streak of Medium posts because this one it has been out for a while and it’s one that. But it’s a post called Optimizing at the Edge Using Regression Discontinuity Designs to Power Decision Making. And it’s just like, it’s kind of coming out of Instacart, kind of the tech team at Instacart, like kind of publishing how they do it. But it’s a good explanation. Like, regression discontinuity is one of those things that I… Well, like Joe Sutherland would reference as being a technique. But I hadn’t found something that gave a sort of a good, simple explanation of, like, what is it? How does it work? So it just kind of falls in that whole little pot of like, oh, here’s another kind of technique that in the. In the hands of a more skilled individual. I mean, for me, it’d be running with scissors to actually try to do it. But it came through my feed and I kind of read it and kind of got into it. I was like, oh, that’s cool. For a brief moment, I understand what that’s actually doing.

1:03:00.5 VK: Those are always good moments when you’re like, oh, I think I get it. And then 10 minutes later, you’re like, I don’t know if it’s…

1:03:05.4 TW: Well, then I made a note of it.

1:03:06.7 VK: Close that tab yet.

1:03:08.0 TW: I’m gonna use this for a last call in a few weeks. And now as I’m using it for a last call, I’m like, yeah, I hope nobody asked me to totally explain. Yeah, gotta read my own last call again. Damn it. All right, well, this has been. This has been a fun chat and super informative, like I. I think we have. We’ve definitely proven there’s. This is a deep. A broad and deep topic. So, Brett, thanks so much for coming on and joining us for this.

1:03:41.0 VK: Awesome.

1:03:43.0 BK: Yeah, thank you very much for having me. That was. It was very nice. Cool.

1:03:46.6 TW: No show would be complete without thanking our producer, Josh Crowhurst, who should probably also call out that starting a couple of episodes ago, we sort of changed our production methods slightly. I’d be kind of curious if you’ve noticed because it’s been a couple episodes, but I actually do want to say behind behind the scenes behind Josh was a guy named Darren Young, who for years had been doing kind of the final final step of production, kind of contracted him to do that, and he did it patiently for a long time, and we threw him curveballs as well. So I just want to call out that Darren was way behind the scenes, did not get acknowledged. We should acknowledge him at least once. Josh is running with how we’re figuring out kind of the podcast without Darren. So thanks to Josh, always Crowhurst. Thanks to Darren. At least the one time publicly knowledged. If you enjoy the show, please leave us a review on whatever platform you listen to us on. We do really appreciate having ratings and reviews or just share it with a colleague, throw a note out on LinkedIn or tag someone or send them an email or a text or make a TikTok about it.

1:05:11.5 TW: Now that would be weird. But pass along the show’s existence if you enjoy it. If you want to reach out to us, you can contact us through LinkedIn. You can reach out to any of us on the measure Slack. You could just send an email to contact @analyticshour.io. So with that, for my co hosts on this episode, Julie and Valerie, who only one of them is an Outlier in everything she does and she has shared a couple.

1:05:43.1 VK: Which one?

1:05:43.7 TW: What do you think? So I know they join me in saying thanks for listening and keep analyzing.

1:05:54.1 Announcer: Thanks for listening. Let’s keep the conversation going with your comments, suggestions and questions. On Twitter @analyticshour, on the web at analyticshour.io, our LinkedIn group and the Measure Chat Slack group. Music for the podcast by Josh Grohrzt. Smart guys wanted to fit in, so they made up a term called analytics. Analytics don’t work. Do the analytics say go for it no matter who’s going for it. So if you and I were on the field, the analytics say go for it. It’s the stupidest, laziest, lamest thing I’ve ever heard for reasoning in competition.

1:06:30.1 TW: I don’t know, Brett, if you’ve signed your book for people, but I’ve had a couple of. I’ve had a couple of book signs.

1:06:37.4 VK: Julie, did you hear this story yet?

1:06:40.7 TW: It’s not that funny.

1:06:43.0 VK: Obviously it is.

1:06:44.4 TW: I did a book signing at the Columbus Data Analytics Wednesday meetup. And I mean, I googled like, how do you. What do you frickin write? Like, I can’t come up with spontaneous stuff.

1:06:54.9 BK: Yeah.

1:06:55.4 TW: So I went down the history of book signing and kind of landed on is just like whether it’s 4 versus 2, whether they bought it or you’re giving it to them or whoever will dash. And then seems like all the best. Period. Sign your name. Which is. I was sitting there signing all the best. Sign my name. All the best. Somewhere in the middle. That funny.

1:07:22.2 JH: I gotta hear the punchline.

1:07:23.8 TW: I signed one of them. Best of luck.

1:07:27.2 JH: No, good luck.

1:07:30.1 TW: Good luck.

1:07:31.5 VK: Good luck. Okay.

1:07:34.0 TW: Val remembers it better than I do. Oh my God, this is good. And it was just a random, so I don’t know who it was. I don’t know how they interpreted.

1:07:44.8 VK: Hopefully they didn’t compare to the other 35 people.

1:07:50.2 TW: You need this book. Good luck to you. Good luck.

1:07:53.6 JH: How do you know that? I don’t know.

1:07:55.2 TW: Dumbass. Oh, my God.

1:08:04.4 JH: Rock, flag and unusual doesn’t mean useful.

Leave a Reply

Click here to cancel reply.

Name (required)

Email (required)

URL

Δ

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Recent Episodes

#284: I Used to Think…But Not Any More
November 11, 2025
https://media.blubrry.com/the_digital_analytics_power/traffic.libsyn.com/analyticshour/APH_-_Episode_284_-_I_Used_to_Think…But_Not_Any_More.mp3Podcast: Download | EmbedSubscribe: RSSTweetShareShareEmail0 Shares

Podcast: Download | Embed
Subscribe: RSS

#284: I Used to Think…But Not Any More
November 11, 2025
#283: Good Things (Can) Come in Small Datasets with Joe Domaleski
October 28, 2025
#282: Using (and Creating!) Data to Understand Pop Culture with Chris Dalla Riva
October 14, 2025
#281: Analytics: The View from the Corner Office with Anna Lee
September 30, 2025
#280: Dashboards Must Die! Long Live Dashboards! with Andy Cotgreave
September 16, 2025

ALL EPISODES

SUBSCRIBE