#286: Metrics Layers. Data Dictionaries. Maybe It's All Semantic (Layers)? With Cindi Howson

Semantic layers are having something of a moment, but they’re not actually new as a concept. Ever since the first database table was designed with cryptic field names that no business user could possibly understand, there’s been a need for some form of mapping and translation. Should every company be considering employing a semantic layer? Is the idea of a single, comprehensive semantic layer within an organization a monolithic concept that is doomed to fail? These questions and more get bandied about on this episode, where we were joined by industry legend Cindi Howson, Chief Data & AI Strategy Officer at Thoughtspot.

This episode’s Measurement Bite from show sponsor Recast is an explanation of multicollinearity from Michael Kaminsky!

Links to Resources Mentioned in the Show

Photo by Vitaly Gariev on Unsplash

Episode Transcript

00:00:05.76 [Announcer]: Welcome to the Analytics Power Hour. Analytics topics covered conversationally and sometimes with explicit language.

00:00:14.64 [Michael Helbling]: Hey everybody, it’s the Analytics Power Hour. This is episode 286. It’s Tuesday and I know what you’re thinking. I sure hope revenue and active customers still mean the same thing as they did yesterday. A lot of you know firsthand the pain I’m describing. There’s data ping-pongs around the business taking on shapes and definitions that were never really intended. Well, the semantic layer was supposed to take care of all that. And to be fair, there are some nice, tidy businesses out there doing a great job. But most of us are still trying to figure out where it should live, what it should be written with, and who should own it. So I think we should dig into it. But first, let me introduce my co-host, Moee Kisss. How are you going?

00:00:58.27 [Moe Kiss]: I’m going great. I’m really excited about this.

00:01:01.08 [Michael Helbling]: I’m excited to, and excited to do the show with you. And Tim Wilson, howdy. I think it’s just all semantics. It’s all, oh, so good. Well, that’s an interesting potential cop out. Okay, no. And I’m like, well, to really get into this topic, I think we found the perfect guest. Cindi Howson is the Chief Data and AI Strategy Officer at ThoughtSpot. She was previously Vice President at Gartner, along with many other distinguished roles throughout her career. She is the host of the Data Chief podcast and has authored many books on BI and data. And today she is our guest. Welcome to the show, Cindi.

00:01:45.36 [Cindi Howson]: Thank you for having me, everyone. I’m so excited to be here.

00:01:49.17 [Michael Helbling]: I am excited that you’re here too. You’re kind of, to me, sort of an OG of the data space. And so I love people who can provide as much depth and background and historical perspective on all the things we’re struggling with in the world of data today that were struggles 20 years ago and still remain with us today, but with different tools and names and things like that. But today we’re talking about sort of

00:02:16.97 [Tim Wilson]: Oh, go ahead, Tim. Well, I was going to say, I mean, I, having done my little forensic sleuthing that I saw Cindi speak at a TDWI summit back in 2004, which I think is amazing since we’re both like 35 years old. We just look at him. It was like my entry moving from technical writing to Markham, slightly into web analytics, and then it was like my entree into the world of analytics and BI was that conference. Yeah, so I just felt like I had an appetite. He’s a big fat. I’m gonna say that.

00:02:53.94 [Cindi Howson]: He’s a big fat. It’s the summary. Tim, now I feel like I have to send you an original BI scorecard black bear that I used to use as giveaways for class participation. We’ll see if I still have one.

00:03:08.12 [Tim Wilson]: Now, I couldn’t remember the topic, but I feel like I was coming back and I was like, there were some other lady who spoke and it wasn’t Jill Deshaix and the other person was like the only I cannot remember her name because we wound up actually Claudia Imhoff. Yes, like two months later called and had Claudia like come out and just spend two days explaining data warehouses to our team. So like doing the conference circuit and as a consultant, you’re like, oh yeah, nobody’s really just going to call you up. Like we literally, she’s like, what do you want me to do? And we’re like, you’re smart. Please come just sit in a room and answer our questions for two days. And it was, Our dev team was like thrilled. And I’m so glad that you can remember that was who it was. Okay. Yeah. Okay. Michael. Okay. But now we’re going to do the episode too.

00:04:00.24 [Michael Helbling]: All right. Fast forward just a few years. Yeah. So nowadays, no. So let’s talk about Star Schema. No. Well, I mean, we can start there. So semantic layers, Cindi, obviously everyone talks about those, but there’s a history here. And maybe just to get us started and for people who aren’t as familiar with the concept, maybe just a quick primer on sort of, what does that even mean? What are those? And we can kind of use that as a launching point.

00:04:31.55 [Cindi Howson]: Sure. So in the simplest terms, a semantic layer provides a representation of the business model in business terms to the physical structures in your whatever, data warehouse, data lake, cloud data platform, whatever you want to call it. And it is important that it is in business terms. So if I think about, my German language has only served me well when I look at SAP original tables. VBAP was the customer table in SAP R3. So you could never show a field VBAP to a business person. Instead, you would say, this is customer name.

00:05:21.95 [Moe Kiss]: And I keep hearing the word when people talk about semantic layers of context. And that seems like when you say business terms, is that what you mean? Like the context of the data, how it relates to each other, what the definition is. Is that the same thing? Or when you say business terms, are you thinking about something different?

00:05:42.42 [Cindi Howson]: No, I think it’s both because if we get precise, something like revenue, well, revenue in an inventory and supply chain context, I’m going to look at revenue based on when somebody placed an order. But if I take it in terms of finance, the Office of Finance, they’re going to want to know when that invoice was paid, or if I’m doing, you know, if I’m a cash basis or a cruel basis. So the context of that revenue field matters. Did that answer your question, Moe? Yeah, absolutely. Okay, totally.

00:06:26.17 [Tim Wilson]: Not to head, I struggle with And revenue is a great example because it’s… part of the challenge, it’s like we’re trying to find a technology or a tool or a process to solve for something where the business, the person in finance, when they think revenue, they’re always thinking in a revenue recognition world. And when somebody’s an inventory, they’re always thinking of it another way and don’t even necessarily, and they both may complain that I see reports from the other department and they have revenue and it’s wrong. It becomes a, business understanding challenge that data processing technique mechanism is trying to solve. Or am I just being cynical about that?

00:07:15.78 [Cindi Howson]: No, it is. I could get annoying and say, yes, it is semantics, Tim. But this is where, let’s say, and sometimes people conflate data literacy with technical literacy, which is wrong. But we’re really talking about what does the data mean in a business context? And where does the data originate from? And if I’m talking about an order system versus an invoicing system, sometimes that’s different. And so a finance person is always going to assume I am talking about when it was invoiced. A salesperson is going to come from the context of when is my commission going to get paid. And so we come to the data already thinking about the data through our own lens, our own business function. And yet they may have very, very different meanings. Even somebody that I was working with I won’t name him, but it was hysterical. We’re both working off the same dataset you would have thought. I’m like, why are your numbers different than mine? Here’s what I thought the number was, and you’re coming up with a different number. Yet, in his dataset, he only included software licensing and did not include professional services. I was like, why would you exclude that? I’m really just looking for total revenues related to this particular segment.

00:09:01.05 [Moe Kiss]: Can I then ask, do we risk? I’m going to definitely come full circle on this because it’s definitely been a topic that’s on my mind a lot. One of the things that I’ve seen play out is this very precise, I don’t want to say business domain, but this very specific interpretation of a metric by a particular area of the business. And I’m going to give the typical example in my world, which is like, let’s say you have 12 different products. And so then one team is like, well, we’re going to talk about video MAUs. And another team is like, we’re going to talk about search MAUs. And then another area of the business is, I don’t know, template MAUs. I’m making up all stuff that’s relevant to my world, of course. And then we come up with this fundamental problem of if you summed every department’s version of their metric, we would never end up with all MAUs. But we also end up with these very precise definitions that might work within the business context that they’re in. But then like, I think the thing that I struggle from that viewpoint is sometimes I feel like we over orchestrate things for a specific domain. Then we can’t roll up and think about what’s the bigger picture across the whole company. When we say, MAU, what do we mean? Because they might have had interactions with lots of different products, for example. I feel like semantic layers in this are like there is an overlap here. I’m sure we’ll get to. But do you see that problem playing out a lot? And is that part of why semantic layers are becoming the new hot topic of the moment?

00:10:38.61 [Cindi Howson]: Well, let’s go back and say, you just used a term, Moe. If I was a new employee at Canva or at any kind of SaaS startup, what the heck is it?

00:10:48.20 [Moe Kiss]: Is it MAU?

00:10:48.60 [Cindi Howson]: Or what the hell?

00:10:50.04 [Moe Kiss]: Oh, fair.

00:10:50.44 [Cindi Howson]: A male. A male active user. Well, and maybe even really going to split hairs here and say, well, if I only clicked on the video and so it was a one-second interaction, are you going to count me there as a user or should I have actually watched at least two minutes of the content? So we can parse these definitions a lot of different ways. But I want to come back to why did semantic layers start more than 30 years ago and why are they coming back now? 30 years ago, prior to semantic layers and really business objects patented and won in courts, the first semantic layer and Cognos at the time within Promptu had to actually pay a license fee to them. And prior to this, you had to code your own SQL. You would have to say some cryptic name, by VBAP L333 from this table, and that was terrible. The semantic layer gave report writers a way to click on business terminology to generate the SQL. That was the first purpose of the semantic layer. Now, as the industry moved to, let’s say, in memory tools, With the likes of Click and Tableau, there’s a whole generation of let’s say 10 years, maybe 15 years, where people didn’t think about this. They just loaded their data, did maybe one big SQL extract. loaded it into an in-memory file. And so they were only working with their subset of data. And so, of course, the MAUs meant what I wanted it to mean. And there was this loss of knowledge about what semantic layers are. Now, here we are in 2025, and we’re all trying to build agentic AI systems. And what we’re learning is that without this context or clear business definitions, we have hallucinations. We get incorrect results. So the more context you give the LLM, the more accurate your answers will be. And that is why I think, well, I think semantic layers have become more important because of agentic AI. But also, let’s say before that, cloud data platforms and the whole modern data stack have given rise to, hey, I don’t have to subset my data. I don’t have to load just a small data set into an in-memory engine. Let me get to all of it, whether it’s in Snowflake or Databricks or Google BigQuery. Let me get to all of it. And so people don’t want to move the data, but they do want to trust it, no matter if they’re doing agentic or not.

00:14:09.83 [Tim Wilson]: Well, so this whole notion of context and using agentic AI as an example, is it moving down the path? Will a semantic layer help AI demand some explicit context? If I ask for, tell me how many customers we had last month, will a semantic layer start to say, that’s not enough? I know, Tim, what role you’re in, I can guess what your definition of a customer is, but I’m going to require that you give me more business context in order for me to find the right to pull the right information. Is that a feature of the semantic layer or is that something that’s got to be built in the intermediary tool that’s using the semantic layer to interface with a business user?

00:15:08.27 [Cindi Howson]: Yeah, I follow you, Tim. And this is where I think what people want is one semantic layer to rule them all. And I just think that’s a fallacy. Will I ever see that? I don’t ever see that. What the industry, at least right now, is trying to get to, and I will also say this is the second attempt, maybe the third attempt in the industry. With snowflakes open semantic interchange, is it least let there be a common set of standards so that everyone can interoperate? And that already would be a huge sea change. Otherwise, everyone’s building proprietary integrations. Even, I mean, I will say, working for ThoughtSpot. ThoughtSpot integrates with the Looker metrics layer in LookML. ThoughtSpot integrates with the DBT semantic layer, and that has changed different incarnations. There’s a few others. that some have built integrations with KubeJS, some have built integrations with AtScale, there’s others. But let’s just take those. Well, those are all point solutions. We have to keep up with what is DBT’s latest protocol, what is Looker’s latest protocol. And it would be great if we all just say, here are the approaches that we’re going to use. And so it’s all common rather than point solutions. So that is the vision and the hope of snowflakes open semantic interchange. However, so this is a very long-winded answer, but we will have separate incarnations. And that, I have to say, like every customer conversation I’ve had about this in the last month, they’re like, we only have to have one instantiation. And I’m like, no, you don’t, you do need separate instantiations because every downstream tool and even backend database, they have their own limitations. So if I’m going to create something, a metric called top 10 customers, well, there’s some databases that don’t support a ranking function. So even like Denoto virtualization tried to do this for a while. And it’s like, great. In ThoughtSpot, we have an object called top 10. Well, if I hit the Snowflake database, it’s working. If on the back end it’s hitting, I’m going to forget which database didn’t support it, some variation of SQL server or whatever didn’t support it. Well, then Denoto is like not working, not giving an answer. Or in Looker, we have a very cool visualization, my favorite visualization, a KPI chart. it’s too complicated for the looker metrics layer. So you’re always going to have these separate instantiations of a semantic layer because nobody is going to want to dumb down their semantic layer for the least common denominator. Okay.

00:18:39.39 [Moe Kiss]: I’ve got to make sure I’m following this though. Okay. So what we’re saying is that I guess the thing is like what I’m observing is that folks seem to want to be pulling their semantic layer further and further up in the chain, right? So like you want it to sit less in a downstream tool and more like internally and I obviously have a biased view. But wanting to bring things like semantic layer in-house so that you also have your options open about which way you go with whatever AI you choose to leverage. But what you’re saying is it’s unavoidable that we’re going to end up with a semantic sandwich or cake or whatever you want to call it, where you might have to have something at one layer. And then when you go to a BI tool or some other type of tool or integration, you might end up having to have a second layer just because they have different features or attributes that you want to leverage. Am I hearing that correctly?

00:19:36.60 [Cindi Howson]: Yes. And if by downstream and upstream you mean the database, people want it closer to the database because that’s where the data lives. But then as you get closer to the business decisions, you’re going to have derivations and metrics and context that may not exist in the database. And I would also say, we also have to think about how these things get defined. So working with one team, they’re like, okay, we’re going to build everything out in the database. I’m like, great. So your DBA is going to do all this. Or here I have a really strong SME. And if we bring data mesh operating principles and domain ownership, so I have this great marketing person and they know the differences between a video, MAU, or a web click MAU. And I’m gonna want them to add a little more context to it. So I’m going to want an easier interface. And guess what? That interface does not exist in the one that was designed for the DBA.

00:20:53.38 [Moe Kiss]: OK, but I don’t want to take things in a totally different direction, but I’m like dying a little bit.

00:21:00.96 [Cindi Howson]: I feel like a dymo.

00:21:03.42 [Moe Kiss]: I opened another can of worms, I can tell. The thing I’m really struggling with with this whole discussion about semantic layers It doesn’t feel new, and I feel like what you’ve written about it makes that very clear. But part of me is also really grappling with, is it actually the fact that it’s not net new? Is it the fact that the way we want to use agentic AI on top of our data? Or is it the fact that we have gone towards this data mesh approach with less, I don’t know if structured is the right word. Cindi, you can definitely insert better terminology because you are the queen of exceptional terminology. But we used to have such structured datasets. We had store schemas, they had context. Is part of this just our own doing because we wanted to move faster and have less structure in our data? And so this is just the consequence.

00:22:04.24 [Cindi Howson]: So that was a two-part question. So is it new? Is the semantic layer new? It’s not new. It has gotten more robust over time. And not all semantic layers are created equal. So I can show you one semantic layer, and it only supports a single star schema. Or maybe even worse, it only supports one big table. I can show you another semantic layer and it supports multiple fact tables, different design approaches, star schema, snowflake schema. It even includes capabilities for aggregate table navigation or query compilation so that the most efficient query path is taken. So not all semantic layers are created equal. And I do think that has changed over time. And for sure, the openness has changed over time. So if I go back to the original query tools, whether again, business objects, Cognos, whatever, those were largely closed. Some boutique consultancies had open APIs to access them. OBI EE, their model was open and nobody used it. You could expose it as an ODBC connector to other BI tools, but nobody used it. Performance was not good. So what we have now is definitely more openness, but I do believe it is the agentic part of why we’re demanding, why we need them more. It’ll just make AI better. The second part of your question was then, are we decentralizing these things? And yes, I think that’s part of it too.

00:24:03.60 [Tim Wilson]: This makes me feel like there are This is either going to be just so obvious that it’s dumb. Are there people out there who, if some generic person came in and looked at it, they would say, you have built a wonderful semantic layer, and the people who built it would say, I built something that functioned for what I need. I didn’t know that’s what it was called. On the flip side, that has me thinking that semantic layer, it sounds cool. It gets treated as this binary. If you have one, things are good. If you don’t have one, they’re bad. It sounds like what you’re saying is, you could try to boil the ocean with one grand semantic layer, and it would probably be bad. We treat it as though there’s this label, and if you have it, then things are fixed. But there’s always the gradations of whether you do it well, well-architected and appropriately. That probably happens with everything that gets a fancy new label.

00:25:07.99 [Cindi Howson]: Yeah. Yeah. So I don’t, I don’t know, Tim, like, do I want one mega semantic layer? Oh, please not. So because it becomes overwhelming to maintain and, and it becomes now maybe, maybe if I’m just using natural language, um, to ask questions, I don’t care what it’s hitting on the back end, but I, I, I would be skeptical that that would work. There is a belief that in the industry we’re going to go towards verticalization of some of these semantic layers. So there will be And maybe this is, I kind of bristle it. We throw new terms out there, ontologies. Well, can we just talk about domains? That makes more sense to me, and that aligns with the data mesh. But could we have an insurance industry semantic layer? Could we have a marketing web analytics semantic layer? I think we could. I think we could. We would get to common metrics. The physical pointers ultimately back to which table is it hitting might change a little bit, but I think that business representation, we could possibly get to that.

00:26:43.49 [Moe Kiss]: Okay, one thought, just asking for a friend, of course, that has been on my mind is we can approach this from like a business domain perspective, just like the examples you gave, right? So like, you might have one that’s more like marketing and acquisition, more the one that’s like, I don’t know, finance or whatever, and so whatever else business domain. And the thing that I kind of keep wrestling with though is, Are we just doing this again where we’re overlaying our thoughts about like what a domain is versus the business user and how they want to interact with data? And what I mean by that is like, if I’m a business user, what’s my business question? What are the questions that I want to ask? And let’s say the theme might be I want to ask a question about our users or I want to ask a question about I don’t know. Now I’m going to like struggle to think of a comparative example. I might want to ask something about, someone help me with an answer. A marketing channel. A marketing channel. Sure. A marketing channel or, yeah, like I want to understand something about how experiments have done. Like, are we doing a thing where we’re trying to make semantic layer is representative of business domains that make maybe business sense, but don’t reflect the way that users and our business users want to interact with data when they have questions.

00:28:15.26 [Cindi Howson]: Well, so to me, if you build a semantic layer that doesn’t work that way, what is the point? Like, go home. Because you know SQL, you want to code your SQL, you don’t need a semantic layer. You might want it for some reusability, but the semantic layer gives the business user the ability to ask the questions without knowing SQL. And then it gives the LLM more context to generate better SQL. So all these companies that have tried to do text to SQL without a semantic layer, they’re largely failing. And guess what? They’re adding semantic layers so that they work. So semantic layers bring reusability. That was the original purpose. And then it is a business-friendly interface. And now in agentic AI, it’s the context for the LLMs to ensure accuracy. So if you’re going to give me a semantic layer that is just a bunch of cryptic names, technical names, and it’s not giving it to me in a way that the business sees it, it’s a waste of time. It’s a poorly architected semantic layer.

00:29:37.66 [Moe Kiss]: So hypothetically, if you just took all your YAML descriptions, that probably wouldn’t be good enough because it’s been written by a data scientist in their domain for their own specific domain for use by someone who deeply understands their area.

00:29:51.28 [Cindi Howson]: Well, if they deeply understand their area, there might be a lot of useful context in there. But if it’s a lot of code, and techno babble, then I think it’s going to be less useful.

00:30:06.98 [Tim Wilson]: Back on the, I may blend two things together. The referencing snowflakes open semantic, what is it? Open semantic interchange does feel like That brings to mind the XKCD cartoon about people complaining that there are 13 different standards. We need one standard, and then the next panel is, well, now we have 14 competing standards. There does need to be a first mover or a dominant, is there a race to say, obviously, Snowflake wants to be the owner, the driver of that? I guess the same thing when you talk about verticalization, say something like digital analytics, and you’re like, let’s just have one common marketing digital analytics. Well, now you’re going to have the players in that are all going to say, yes, the way that we think about that data is the way that the industry. the effort to try to have some sense of standards not lead to self-interested competition to sort of pull the market towards whoever’s on point for defining the standard. Or maybe my third example would be the W3C. I mean, we go back 30 years trying to define what HTML is supposed to do. And Microsoft doesn’t even conform to the W3C standards because

00:31:40.40 [Cindi Howson]: Tim, you just answered that question. Microsoft would love us to revert to MDX instead of SQL for the most part. But it is true. So look at who is not part of that. effort. Was Databricks invited to the party? Was Google BigQuery invited to the party? Will they invite themselves? Will they become part of it? Standards get adopted based on who leads it, but then also who uses it and who asks for it. So that’s where when I look at how we prioritize our product strategies, we are very much listening to the customers. And sometimes we’ve gone down rabbit holes. And I’m like, why did we build that integration? So I won’t say which integrations to me were a waste of time. But some of them, I’m like, why did we do that? Because we were trying to, we thought something would have legs to it. We listened to the customer and it never really took off. And then some will change strategies. So, you know, we thought DBT’s initial effort would take off and instead then, you know, they’re on version two and now So Snowflake, hugely influential in the industry. We’re very proud to be part of the committee defining these standards, but we have to see how broadly adopted they are. The market will decide.

00:33:29.23 [Michael Helbling]: And certainly right now, AI is kind of a forcing function for the industry where maybe that hasn’t been or there hasn’t been an imperative like that for a lot of companies. Does that seem fair?

00:33:41.61 [Cindi Howson]: I think that seems fair. Yeah. And there’s more willingness to be open and to focus on where you add the value in this data to insight to action change.

00:33:59.08 [Moe Kiss]: That actually triggers an interesting thought. One of the things that I’ve observed is this push for semantic layers. I feel like it’s come out of left field. I don’t know if that’s fair or not. It just seems to have swirled very quickly. And the products maybe aren’t at a state of maturity where they need to be for what people. I almost feel like a lot of companies are building as they’re gathering requirements as customers are trying to build out with them. Do you think that’s a fair representation? Has this happened before with a particular tool that’s had to develop very quickly because of the pressure? And I feel like AI is the pressure of like, Everyone suddenly needs these semantic layers to make AI quote unquote work. Do you feel like that’s happened with the product development before or is this like a net new thing that data companies are trying to deal with where they’re trying to build at pace while customers are wanting to already leverage and use it?

00:35:00.49 [Cindi Howson]: Yeah. So I don’t want to sound like a commercial and you can edit this out afterwards. All semantic layers are not created equal. Now, fortunately, because ThoughtSpot, whether it was purposeful or luck, ThoughtSpot always generated SQL on the backend. So the semantic layer was always super robust. So, did we get lucky or was it intentional? And the cloud data warehouse and agentic AI has just helped that. Others have only just started to embark on natural language processing. and a gentek AI. And they tried to do it without a semantic layer. And that’s why now they’re dabbling in it. And they’re like, oh, it takes a lot to build this. And some of them, they’re starting out simple. You know one big table. That’s all they can handle and code based. So and I think about I think about a blog actually that our co-founder Amit Prakash wrote about four years ago, I think it was, and it was the metrics layer, which is just a subset of the semantic layer. The metrics layer has some growing up to do. And even as a former Gartner research vice president, I have to give credit to Gartner, they still say that the time to maturity for metrics layers is five to 10 years. That’s a long time.

00:36:45.52 [Tim Wilson]: Yeah. But so how unfair is this parallel to point to master data management as something that I remember having a moment that was, oh, things are getting fragmented. We need to just do an MDM initiative. And I guess to my earlier point, it was kind of a binary, if we do MDM, all these problems get solved. The companies that were already built where they had MDM under the hood anyway because they’d architected their setup well, could do MDM, the ones that had built kind of a hot mess and were then trying to just apply a whole bunch of duct tape and bailing wire to do MDM like never really got there. Is that a fair parallel or am I too much of a stretch?

00:37:45.56 [Cindi Howson]: Yeah, well, so is it a fair parallel? I would just say it’s valid. It’s valid. And I remember, so the first eight years of my life in this industry were at Dow Chemical. And we had a master data management system called INCA. I don’t even remember what it stands for. It was homegrown. And then I worked at Deloitte, and I was like, wait, you don’t have clean product codes? You don’t have a single product hierarchy? You don’t have clean customer data? It was a foreign concept to me that not everyone had clean master data. So I would just say that And Moe, you asked this earlier, so I wanted to come back to this point. Semantic layers right now are mainly for the structured data. But I think there’s a time in the not too distant future that it will encompass also the semi-structured data. And I would say this data is a hot mess, frankly, because we’ve never applied all of these data governance and data management disciplines that we have been applying to the structured data. So I think organizations that had the organizations that are best positioned for the agentic AI era got to cloud. had clean data, had good master data, and then of course the culture and the people change management. If they already did that, they’re already, they have such a leg up. Now we’re throwing generative AI, agentic AI, semi-structured data, a lot more data that we couldn’t get to before. And yeah, it’s not that easy.

00:39:48.31 [Michael Helbling]: It’s nice to know we’ll continue to have jobs going into the future though.

00:39:51.87 [Cindi Howson]: Yeah. That’s why I’m like, what’s everyone worried about not having jobs? They, they just will be different jobs, different jobs. Yeah.

00:40:00.90 [Tim Wilson]: I got one more. This is like, this could be a complete non sequitur, but it, but I feel like Cindi could tee off on this and I want more color because it was, it was from the post that you’d written where the quote was, our industry has also now raised the generation of data analysts who never learned proper data modeling. And I kind of wanted you to elaborate on that.

00:40:23.75 [Cindi Howson]: Well, I’m going to say first, tell me if you disagree or not. But tell me if you disagree or not. But I follow the work of people like Joe Rice and Sonny Rivera, a snowflake superhero. And yeah, it’s, and I work with a lot of, let’s say visualization experts who are just used to one-offs, let me load the data and let me visualize it. They never really learned proper data modeling techniques.

00:41:02.35 [Tim Wilson]: I guess my question is that a way of saying that there are analysts who aren’t really actually thinking about the structure of the data and the ramifications for how the data fits together. They’re just trying to get to and output. I don’t know if I agree or disagree. I probably agree because I’m just generally negative and that’s like a negative statement.

00:41:31.71 [Cindi Howson]: Let’s not take it negative. Yeah. Let’s challenge these people. Let, to me, empower them. So you know what? You’re great at visualization and you’re great at building dashboards. But if you want to continue to have a career on this space, in this space, I want you to learn some data modeling fundamentals. And I don’t care which methodology you follow. Learn some data modeling. That’s on the technical side. But also, we talk about data literacy. We also need to bring in business literacy. And so to me, it’s not just about Where is the data coming from? It is also, how is it used? And that there really might be two different definitions. I mean, when I talk to somebody in airlines, I don’t even, I’m like, oh wow, I think of on time performance. Did it leave the gate on time? or did it arrive on time? Which one is really more important to you? And by the way, when you’re crossing international date lines, that it gets a little more complicated still. So I would say I want these analysts to learn both the skills.

00:42:59.57 [Moe Kiss]: I have one last question. Just hypothetically, if you were implementing a semantic layer, what would be the top three things you’d want to avoid?

00:43:09.34 [Cindi Howson]: The top three things? Okay, well, I’m going to start with the first thing I would want to do, so I’d have to flip it, avoid it, or what do I want to do?

00:43:20.30 [Moe Kiss]: Or you can do the top three things to make it successful, either way, whichever your brain works.

00:43:25.43 [Cindi Howson]: You want to avoid bringing in absolutely everything in the physical storage and exposing that to mere mortals because that’ll be overwhelming. So I always start with who is going to use this. And what are the top questions they’re going to want to be able to ask of it? Not because I’m going to hard code that, but that I’m going to get an idea of the context in which they’re operating.

00:43:56.65 [Michael Helbling]: Cindi, wow. So cool to talk to you. Thank you so much. This has been really, really good. I’ve got a ton of notes that I’ve been writing down. So I know that our listeners probably also get gaining a lot from this episode. All right, well, let me switch gears really quickly because I need to talk about a quick break with our friend Michael Kaminsky from ReCast. The media makes marketing and GeoLift platform helping teams forecast accurately and make better decisions. Michael’s been sharing with us bite-sized marketing science lessons over the last couple of months, and they’ll help you measure smarter. Okay, over to you, Michael.

00:44:38.12 [Michael Kaminsky (Recast)]: Multicollinearity strikes fear into the hearts of many analysts and executives, but it’s also one of the most commonly misunderstood concepts in analytics. Some amount of correlation across variables is expected in most real-world analyses, so it’s critical to understand what multicollinearity is, why it causes issues, and whether or not it’s a problem for your particular analysis. multicollinearity means that two of your variables share some of the same signal. This causes problems for a regression model, which will not know how to allocate credit between the two variables. This can cause challenges when it comes to interpreting the results of your regression. Let’s imagine you’re modeling the drivers of home prices in some geography, and you want to include home square footage and the number of bedrooms as predictors. These two variables share some amount of signal, namely about the bigness of the house. If you include both variables in a simple linear regression, you’ll often get strange results, where one of the two variables is highly impactful with a large coefficient, and the other might be very small or even negative. Slightly different data sets might even cause the variables to flip, which one is positive and which one is negative. This happens because the model doesn’t know how to apportion credit for bigness, which is present in both variables. So you get these strange results. So the core problem of multicollinearity is that when there’s shared information across variables, a simple regression won’t know how to apportion credit between them. This means that you either need to accept more uncertainty in results, or try to change the variables you’re using to account for the shared information.

00:45:52.32 [Michael Helbling]: Thanks, Michael. And for those who haven’t heard, our friends at ReCast just launched their new incrementality testing platform, GeoLift, by ReCast. It’s a simple, powerful way for marketing and data deems to measure the true impact of their advertising spend, and even better, you can use it completely free for six months. Just visit getrecast.com slash geolift to start your trial today. Okay, well, we’ve got that done. One thing we’d love to do is go around the horn and share something we call last call, something of interest that might be of interest to our listeners. Cindi, you’re our guest. Do you have a last call you’d like to share?

00:46:29.26 [Cindi Howson]: Well, I want to ask a question if I can on the last call. And when you think about how quickly our industry is moving and innovating, what do you see as your best method media to keep up? Is it listening to podcasts, reading, substack or medium articles, or how do you feel about books?

00:46:53.37 [Michael Helbling]: Are we supposed to answer that?

00:46:55.23 [Cindi Howson]: Well, I’m looking for feedback because you know, even though I’m a podcast host, I’m a writer at heart and yet is the industry moving too quickly for another book?

00:47:08.29 [Moe Kiss]: Yeah. I mean, I can speak for myself. I listen to podcasts and host a podcast. That’s a big part of how I stay up to date. But I also, I love books. I’m a book person. Probably books more than articles. But you listen to a lot of the books, right? Yes, I do, but that’s just because of my life stage of being time poor. I end up listening to books on Audible a lot. Yeah, for sure.

00:47:34.88 [Michael Helbling]: What about you, House? I would say my number one source is articles. So in my day-to-day travels, I’ll run across an article and then bookmark it and read it later. So I’ll do that. I buy a lot of books and then don’t read them. Oh boy. In fact, that’s… Right behind me. Michael, have you finished the book? I have not finished your book, Tim. Oh, well, you haven’t finished that either. Uh, so yeah, but I, so I don’t, cause for me reading is sort of like an enjoyable pastime. And I, unlike Moe, I can’t pay attention if someone’s reading it aloud or audio books. So I have to sit down and read it. And then when I do finally get a chance to read, I end up reading like sci-fi or fantasy novels instead of business books. So it’s, it’s a tough one. And then of course, of course podcasts are very important. I have to believe that, right? So there you go.

00:48:32.18 [Cindi Howson]: This feels like confessions of a podcast host.

00:48:35.88 [Michael Helbling]: Yeah, that’s right. Exactly. What do you think Tim? I listen to a ton of podcasts. Yeah, he does.

00:48:43.27 [Tim Wilson]: I listen to a ton of podcasts and very few of them are business or data analytics related. So I am very much the subscribe to, I mean, a medium substack. daily weekly newsletter fiend, which starts to feel a little overwhelming. But yeah, so with the occasional book. The books feel like a chore, though. Well, I feel like if someone else is doing a good job. So cool. I’m just going to be clear. So I don’t listen to the podcast, even though I make one. And I don’t tend to read. I struggle to read the books, even though I wrote one. So yeah, I’m the worst.

00:49:23.03 [Cindi Howson]: Wow. So I think Tim summed it up. Wait, are you telling me two-third of our time spent is like a waste of time? Why am I writing books and why am I hosting a podcast? I’m just gonna get on with building stuff. Okay.

00:49:36.79 [Michael Helbling]: I don’t like the data that we’ve uncovered here.

00:49:42.04 [Tim Wilson]: I mean, I get a lot of value out of hosting the podcast because we get to have excuses to say, hey, why don’t you come on and explain semantic layers to us?

00:49:50.66 [Michael Helbling]: So yeah, that is actually doing a podcast is one of the ways I learn new things. So that’s something you could add to the mix. Yeah. So when is your next book coming out?

00:50:03.30 [Cindi Howson]: I don’t know. Can I take a break from the podcast or stop something? I don’t know. I don’t know.

00:50:09.77 [Michael Helbling]: This is what I was trying to figure out.

00:50:12.20 [Cindi Howson]: What should I do next? Yeah. Yeah.

00:50:14.24 [Michael Helbling]: Fair point. All right. Tim, what about you? What’s your last call?

00:50:19.94 [Tim Wilson]: Well, I guess follow on. There is a sub-stack that I discovered a couple of months ago from somewhere that is We Have the Data. It’s kind of silly. It’s kind of data visualization candy, but it’s WeHaveTheData.net. I think it’s a couple of times a week, and it’s just kind of a It’s like NOMLAC news, but data visualizations instead. So they’re pretty lengthy. They’re a collection of often kind of trivial data visualizations, but it’s kind of a fun scroll in my inbox.

00:50:54.66 [Michael Helbling]: Outstanding. All right, Moe, what about you?

00:50:59.06 [Moe Kiss]: I want to do a plug for Cindi’s podcast. I was lucky enough to be a guest back in October and it’s called The Data Chief. And as you can tell, I ended up hanging out after the show and picking Cindi’s brain for like another 30, 40 minutes about all of these topics, which is why she’s here today. And she just has such a range of like really incredible guests. It’s a really different format to our show. So really encourage you to go check out The Data Chief podcast.

00:51:29.54 [Michael Helbling]: I’m standing in, and yeah, we’ll put a link to that in our show notes as well, so people can find it.

00:51:36.37 [Tim Wilson]: You’re supposed to hand her at the beginning.

00:51:37.99 [Michael Helbling]: It’s fine. We’ll hand her all over the place. What’s your last call? Well, I’m so glad you asked him. So a good friend of mine, Mary Gates, actually made me aware of this. So Informs, which I’m sure we’re all familiar with, they have an initiative called Pro Bono Analytics. So I’m a big fan of any analytics initiatives that I’ve been able to be part of them over the years that help nonprofits and allow people to give of their skills and data and analytics to nonprofits and mentorship and things like that. Pro Bono Analytics is an initiative run by Informs. And so I just wanted to give that a shout out. I was not familiar with this before, but it looks like a very cool organization. And so if you’re a nonprofit and you’re listening, that might be an amazing place to partner with them to get help with data initiatives. And if you’re a professional in working in data and you want to find a way to give back, that might be an amazing way to do that. So we’ll put a link to that in the show as well. OK. As you’ve been listening about on this topic of semantic layers, I’m sure you have thoughts. I’m sure you have questions. We would love to hear from you. Go ahead and reach out to us. And there’s three main ways you can do that. You can do that through LinkedIn or the measure slack chat group, or you can email us at contact at analyticshour.io. And yeah, we’d love to hear from you. Cindi, once again, this has been a very information-rich and awesome episode, and primarily because your deep knowledge and expertise in this field. So thank you again so much for joining.

00:53:21.43 [Cindi Howson]: Thank you for having me. I feel like we should do this over a cup of coffee or a glass of wine at some point.

00:53:28.71 [Michael Helbling]: Yes, I wholeheartedly agree. That’s how this whole podcast started was because we’re all drinking at an analytics conference and said, we should put this on the radio. That’s a great idea. That’s right. Another drunken, great ideas. All right. Also, if you are somebody who puts and is not directed at you, Cindi, this is back to the audience. If you’re someone who puts stickers on your laptops or whatever, we do have stickers and we’d love to send you one. You can actually request one on our website so you can go and do that. And then Obviously, no show would be complete without saying a huge thank you to all of you listeners who go out and share ratings and reviews with us and tell us how you’re enjoying the show. So please continue to do that. We look forward to that feedback. We appreciate it very much. All right. As we wrap up, I know that no matter if you’re trying to Build one ring to rule them all type of semantic layers, or if you’re spreading it out across verticals. I know both of my co-hosts, Tim and Moe, would agree with me. You should keep analyzing.

00:54:41.77 [Announcer]: Thanks for listening. Let’s keep the conversation going with your comments, suggestions, and questions on Twitter at @analyticshour, on the web at analyticshour.io, our LinkedIn group, and the Measure Chat Slack group. Music for the podcast by Josh Grohurst. Those smart guys wanted to fit in, so they made up a term called analytics. Analytics don’t work.

00:55:06.36 [Charles Barkley]: Do the analytics say go for it, no matter who’s going for it? So if you and I were on the field, the analytics say go for it. It’s the stupidest, laziest, lamest thing I’ve ever heard for reasoning in competition.

00:55:19.07 [Michael Helbling]: We’ll just do our best with it. It’s why we have an audio engineer. Hi, Tony. Hi, Tony.

00:55:37.04 [Tim Wilson]: Rock flag and semantic layers are 30 years old.

Leave a Reply



This site uses Akismet to reduce spam. Learn how your comment data is processed.

Have an Idea for an Upcoming Episode?

Recent Episodes

#286: Metrics Layers. Data Dictionaries. Maybe It's All Semantic (Layers)? With Cindi Howson

#286: Metrics Layers. Data Dictionaries. Maybe It’s All Semantic (Layers)? With Cindi Howson

https://media.blubrry.com/the_digital_analytics_power/traffic.libsyn.com/analyticshour/APH_-_Episode_286_-_Metrics_Layers._Data_Dictionaries._Maybe_Its_All_Semantic_Layers.mp3Podcast: Download | EmbedSubscribe: RSSTweetShareShareEmail0 Shares