#251: The Continued Rise of the Analytics Engineer

We’re seeing the title “Analytics Engineer” continue to rise, and it’s in large part due to individuals realizing that there’s a name for the type of work they’ve found themselves doing more and more. In today’s landscape, there’s truly a need for someone with some Data Engineering chops with an eye towards business use cases. We were fortunate to have the one of the co-authors of The Fundamentals of Analytics Engineering, Dumky de Wilde, join us to discuss the ins and outs of this popular role! Listen in to hear more about the skills and responsibilities of this role, some fun analogies to help explain to your grandma what AE’s do, and even tips for individuals in this role for how they can communicate the value and impact of their work to senior leadership!

Links to Resources Mentioned in the Show

Photo by Hanson Lu on Unsplash

Episode Transcript

0:00:00.0 Michael Helbling: Before we start the show, we have a special announcement. This fall, the Analytics Power Hour crew is headed to MeasureCamp Chicago.

0:00:08.9 Moe Kiss: That’s right. Even your co-host from all the way in Australia will be there on Saturday, September 7, to join in all the unconference MeasureCamp fun.

0:00:18.1 Julie Hoyer: Oh, I’m so excited that we’re all gonna be together. Well, except we’ll be missing Josh, but we’ll have him there in spirit. But I’m curious, I’ve never been to a MeasureCamp. What’s it like?

0:00:27.2 MH: What’s it like? Okay. Well, I’ve been to one of them in Europe and I’ve been to, I think, all of the ones that have been in person in the US. And to me, the most iconic feature is that the schedule is created on the day of the event and everyone who attends is encouraged to actually lead a session based on whatever they’re finding most interesting or most useful, or even maybe what’s vexing them the most of late. So it’s really all about an exchange of ideas and having some really in depth and rich discussions with your peers.

0:01:00.9 MK: I’ve also been to quite a few and I’ve also helped with planning the one we run in Sydney. And the truth is, it’s just phenomenal. It’s better than Christmas Day, honestly. And one of my favorite parts of MeasureCamp is that they’re held on a Saturday, so it doesn’t interfere with your work. And the tickets are always free.

0:01:15.7 MH: Yeah. And I loved my experience at MeasureCamp Austin earlier this year. It was so accessible to everybody and it was so fun. Okay. So what are we gonna be doing there?

0:01:27.1 Val Kroll: So we’re gonna be doing a couple things. So the first is we’re gonna have a room booked for us all day long where you can stop by and visit a couple of the co-hosts and talk about what you’ve been talking about throughout the day or maybe one of the sessions you’re presenting. And we’re also gonna have a couple questions posted up on the board, day of, and you can come in and give us your answer to those prompts. And then at the end of the day, during the happy hour, we’re also gonna do a short live show.

0:01:52.5 MH: Will there be shots?

[laughter]

0:01:55.5 JH: So mark your calendars for Saturday, September 7th at 9:00 AM at the Leo Burnett building, downtown Chicago, right on the river and just a couple of blocks from Michigan Avenue.

0:02:06.1 VK: Get your free tickets now by heading to bitly/aph-chicago and start thinking now about what you might like to present or talk about.

0:02:15.4 MH: Awesome. We’re headed to Chicago, but now let’s start the show.

[music]

0:02:26.2 Tim Wilson: Welcome to The Analytics Power Hour, analytics topics covered conversationally and sometimes with explicit language.

0:02:34.3 MH: Hey, everyone, welcome. It’s the Analytics Power Hour. This is episode 251. You know they’re everywhere now. It’s like the hottest job in analytics. It’s the analytics engineer. Even at companies where they already have analytics engineers, it always seems like they could use a couple more. The analytics industry has been in a state of near constant evolution for more than 20 years now. And in the last five years, we have seen this role grow massively. So we wanna talk about it. So in light of that, let me introduce my co-hosts. Val Kroll, how are you doing?

0:03:11.2 VK: Pretty good.

0:03:13.0 MH: Awesome. And Moe Kiss, how are you?

0:03:14.6 MK: I’m doing wonderfully.

0:03:16.1 MH: Yeah. Moe, you started talking about this before any of us, so I’m really excited to bring this full circle. We can all act like we’re on board now, but you were definitely the one pushing this on us as a podcast. And I’m Michael Helbling. We also have a guest today to join us in this conversation. Dumky de Wilde is a senior analytics engineer at Xebia and he’s held numerous other analytics engineering and data roles throughout his 10 years in the analytics industry. He is a co-author of the new book, Fundamentals of Analytics Engineering, and today he is our guest. Welcome to the show, Dumky.

0:03:51.6 Dumky de Wilde: Thank you. Thank you. Happy to be here.

0:03:53.8 MH: Well, we’re glad you could come too. This is something that’s pretty exciting and we wanna keep covering it ’cause it’s definitely grown so much and it’s a really important part of the analytics ecosystem. But I think maybe a great place to start, because there’s a number of our listeners who are analytics engineers, but there’s quite a few who are not. Maybe we start just with, from your perspective, a definition of what is analytics engineering?

0:04:19.6 DW: Yeah. That’s a really great question, and I will say that this is not… The jury is not out on this yet. So what we define in the book is really that the analytics engineer, is kind of a bridge between business and data engineering. And I think traditionally, the way I compare it is back when building websites was a new thing, you had this concept of a full stack developer, some person that did everything like designing the website, building the front end, building the backend. And then as time goes on, really what you see is that these websites get bigger and bigger, and so no one person can do the job anymore. And so similarly in analytics and in the data field, what you see is that, traditionally, you had this data engineering role where people would ingest data and also make it available in a data warehouse.

0:05:15.6 DW: And then you have your BI team or BI developers that then work with that data. And every request for changes, for changes to a data model or changes to new sources would have to go through that data team. And what we saw in the last couple years is that as companies or as the data teams at companies grow and the number of companies using data and analytics is growing as well, you see this new role emerge of analysts who are also more technical and take up these best practices from software development like version control in their way of working. And so that is also when tools like DBT or data forum come up that make that kind of development workflow a lot easier and facilitate the growing need for data and analytics at companies.

0:06:13.7 VK: So, one thing that would be super helpful for me, ’cause that was a great definition, is, could you talk about the differences between that and a data engineer? Because I have to admit that sometimes, I feel like I get those confused, or understanding the differences. I would love to hear your thoughts.

0:06:30.0 DW: Yeah. That’s a really good point. And so what it means for me is that a data engineer is usually a person that will do a little bit more low level stuff. So, they will set up ingestion, so basically taking data from a source system to a destination system, which is usually your data warehouse or a data lake. They will either use existing tools or connectors, something like Fivetran or Stitch or similar tools to get data from point A to B, or sometimes they will write custom connectors to facilitate the process. And that’s something that I often see for old on-premise systems where you wanna get data into your data warehouse. And so they do a lot with that kind of stuff, but also the network connectivity, and then input validation or basically validating your data that’s coming in.

0:07:31.0 DW: And then on the other side, you will have your analytics engineer and they will be working more, actually in, let’s say SQL or in a way where they take that input, like the raw data that’s coming in, and they make it… They map it to their business processes and the architecture of their organization so that in the end you will have models coming out that are more meaningful, semantically meaningful, to the people that will actually do the analysis. And so what I see is that for an analytics engineer, there’s a lot of knowledge required around data modeling, which when I first started into this space, I was like, “How hard can it be?” [chuckle] But it’s the same for any analyst, right? You think like, how hard can it be to really answer this question? And then you come into all these different types of processes, and between teams in an organization, the processes are different and so the definitions are different. And basically what analytics engineering is starting to do is take that a little bit away from all the different analysts and put it into one central place so that everyone can use that same definition. And that requires a lot of data modeling, data modeling techniques to make that happen, basically.

0:08:50.6 MK: Yeah, it’s funny, I feel like I have been on this real journey myself, having worked at quite a, I guess, a tradition… I don’t know if traditional is the word, but previous version or iteration of job titles where it was very much like a star schema with a data engineering team. And we would put in requests to then… I probably went through a phase where I actually just thought the change from data engineering to analytics engineering was like, “Oh, they’re just calling themselves something different, but it’s the same job.” And actually, I did really enjoy reading the book that you co-wrote, The Fundamentals of Analytics Engineering, and I probably didn’t sufficiently grasp the difference of skillset as well. Even particularly from a language perspective of Scala and Java and stuff, and I guess the analytics engineer being so much more SQL based and so much closer to the business.

0:09:44.5 MK: And I think that that is probably the biggest difference that I have personally noticed is that when we had that previous model, the data engineering team were quite… They sat so far from the business that they’d… They’d produce something, but they didn’t understand the reports that you wanted to build with it, or how you… The measures that you wanted. And so you’d end up just going back and forth a lot. And I feel like analytics engineers have really filled that gap of, once the data’s taken in, to how to actually get it to the state that the end user, and I’m calling the data scientist and data analyst in this case an end user, but I’m sure that’s up for debate too. And yeah, I just wanted to say how much I enjoyed learning, I guess, more the detail about some of these topics that I guess I took for granted. I don’t know. There wasn’t really a question in there.

[chuckle]

0:10:35.7 DW: Well, thanks. But it is a really interesting point. And I see this a lot in… So I work for a consultancy, so I see a lot of different businesses from the inside basically, and they all have their different processes, but you do indeed see that the analytics engineers or the people that are most suited to be analytics engineers are the technically minded analysts, because the data engineers usually come from a software development background. And that’s not to say… This is generalizing, so it’s not to say that you can’t have either of these roles do different things. But what you see is that in software development, there’s just this traditional way of working where you get your Jira tickets and you work on your ticket and that’s it, and then someone else checks your work. Whereas in the analytic space, I think people have been forced to find ways to answer their question… They have this urge to answer their question, and they find ways to make that work regardless of whether the data is available in their system currently or not.

0:11:48.4 DW: And so that kind of mindset of thinking that, “Hey, my stakeholder really needs an answer to this question to make a well-informed decision, that really helps to then say, “Why can’t we do this a little bit faster? Why can’t we do this in this way or that way, or hack it together a little bit?” And I think that really helps in this type of work to really think along with your business stakeholder about how to make it work instead of saying, “File a request with my product owner and he’ll handle everything.”

[music]

0:12:22.6 TW: It’s time to step away from the show for a quick word about Piwik PRO. Tim, tell us about it.

0:12:28.7 MH: Well, Piwik PRO has really exploded in popularity and keeps adding new functionality.

0:12:33.1 TW: They sure have. They’ve got an easy-to-use interface, a full set of features with capabilities like custom reports, enhanced e-commerce tracking and a customer data platform.

0:12:44.7 MH: We love running Piwik PRO’s free plan on the podcast website, but they also have a paid plan that adds scale and some additional features.

0:12:52.5 TW: Yeah. Head over to piwik.pro and check them out for yourself. You can get started with their free plan. That’s piwik.pro. And now, let’s get back to the show.

[music]

0:13:04.3 VK: So I know we’ve jumped in to ask all these definitions and the differences in comparisons, but in the book, there is this amazing supermarket analogy that I think our listeners would love to hear. So if you wouldn’t mind walking that through, I would feel selfish if I only read it myself and we didn’t have you walk through it on the show.

0:13:24.9 DW: Yeah, no, I think that that is a really interesting one, and full credits for that to my colleague Ricardo who came up with it. So what is interesting in that analogy is that if you imagine a supermarket, there’s a lot of different elements that happen to get your groceries and basically the whole process before you have your groceries in your kitchen. So in a supermarket, you could have, let’s say, an analyst who wants to understand a little bit more about what articles are being sold, maybe what articles are being sold together or where to find new articles. And you can imagine that a data engineer, the articles… The groceries will have to come, the fresh produce will have to come in in some way. So your data engineer could basically bring in the fresh produce from their source.

0:14:24.7 DW: But that is not enough, because then you have this analyst who says, “Hey, I have this idea about how to organize my supermarket,” and then there’s this guy who shows up with a truck, says, “Here’s a bunch of fresh produce.” But actually, what the analytics engineer does is be that man in the middle to kind of organize your shelf space, think about, how can I facilitate that? The shelves are also organized in order from first in… I never worked in a supermarket, to be honest, so [laughter] I’m not too familiar with this first in, first out stuff. And so you can imagine that there are different levels to think about how this produce and these items in the supermarket go from one step to another. And yes, it’s very important to have that analysis of, how are my customers navigating the supermarket?

0:15:19.2 DW: What are they actually buying? What do they need? And can I predict that? And it’s super important to have that produce coming in, but you can really make a difference as a supermarket by bridging that gap between, we’re not just gonna put everything out, we’re gonna have someone who thinks about how to organize it, how to model it, think about how is that customer gonna walk through the store and where to put stuff in the right place. So yeah, that to us is kind of the role of the analytics engineer.

0:15:50.0 MK: So like I said, I did feel at some stage there was definitely a misconception by me that data engineers and analytics engineers were the same thing. And I think that has definitely shifted, even through my own experience at work. But I guess one could argue that the scope then has had to narrow for both specialties, almost less full stack and more like specialization. But then I also think, particularly, ’cause I have three analytics engineers in my team in marketing, it’s like… That I also look at the breadth of tools in marketing and the modern data stack, how complicated it is, the privacy and governance stuff, and I’m like… I also am torn by, yes, there’s specialization, but then there’s maybe more complexity or difficulty. Do you feel that tension as well?

0:16:44.9 DW: Yeah, yeah. That’s definitely been a key point in the things that I think about in terms of… Or especially what I wanna help people understand when I’m on a job. I want ’em to be able to understand the space that they’re in and the problem in that space in their organization and find the right solution for that. So analytics engineering is basically a solution to a problem that didn’t really exist before. So it’s only come to existence because… In the last years. And I think part of this is maybe even due to COVID accelerating businesses being online and growing in their data space, so there’s a need for bigger teams and more structured way of working. But that doesn’t mean that you cannot have a single person data team in, let’s say, a startup or a very small company that can do a lot of work with the tools that you currently have available.

0:17:49.1 DW: So one of the things where I think you can really see this, and we discussed this in our chapter on observability and data quality, is that you don’t necessarily have a problem with data quality at first, and so it doesn’t always make sense to say, from the start, that, “I need this data quality or observability tool that does everything for me,” and then you pay a ton of money to make that work. It’s only when you see that something really influences the way you make decisions in your organization or the quality of those decisions is impacted, that you start to think about, “Okay, well maybe I do need a data quality or observability tool.” But then you can start to think about, “Do I actually need observability for everything that I own within my stack, or is it just that I need to write some tests in DBT to say like, ‘Hey, I want this… The output of this column needs to be unique, or there don’t need to be any… No values.’” You really need to think through the problem that you have and then identify the tools that will help you solve those problems. Now, I will say that identifying those tools is a lot harder than it seems at first, because there’s just so many of them.

0:19:06.6 MH: So many.

0:19:06.7 DW: And part of the job is keeping up with that, in a way. But what we do try to do in the book as well is to say it’s not always about the tools, even though some of the technology changes have been very revolutionary. We can talk about that later, but it’s not just about the tools. It’s about certain concepts or problems that those tools solve. So for example, getting data from a source system to a destination system and specific types of data like marketing data or sales data, those are problems that many companies have, right? And they have the same exact problem. And so you see these ETL vendors like Fivetran and Stitch and Talend fill up that space. And so if you understand that you as a company are not necessarily a unique snowflake in that sense, but that your problem is similar to the problem that other companies have, you can start using tools that cater to the needs of many.

0:20:06.2 DW: And those can be way more efficient than what you could do on your own. And I think this is actually very similar with what you’ve seen in the web analytics space as well. So I’ve been to a few companies that try to build their own web analytics tool, and yes, it works, but it’s just way more efficient and effective if you take one of those off-the-shelf web analytics tools, because you just can’t think of every possible use case out there, and your requirements will change and your team will change. And it’s just not your core business. So yeah, I think for a couple of these elements, data ingestion, data warehousing, quality and observability, you want to identify the problem that you have and just go and look out there to see what’s already available, and identify if it solves your problem.

0:21:01.7 VK: Love that. So I have a quick question just on org structure based on what you were just sharing there. Because unfortunately, I haven’t had the pleasure of working directly with the team, with someone called capital A, capital E analytics engineer. [chuckle] Although, now that I understand a little bit more about what it is, I think there were some people who had that as a part of their role. And you were starting to tee this up a little bit too, Moe, about the endpoint and the end users. So I thought in my head, traditionally, that one of the first analytics roles that you would hire if you were hiring a team for your business or this need would be the analyst, and that they would be working with people in IT, similar to what Moe said. But do you see… Even when you’re talking about the business, I wonder if you mean even sometimes some of the analysts that sit inside of some of those business roles.

0:21:51.6 VK: And so maybe the first hire of the team could be an analytics engineer, and that would be that layer, that the center of excellence of the analytics team could really be stocked more with those engineering roles servicing the analysts that are embedded within those teams. Or I’m just curious if you could talk a little bit about how you’ve seen the org models of different teams change or organizations who are really ramping up their hiring of these analytics engineers, where they’re even sitting inside the organization. Is it in a marketing org like at Canva, or is it within IT? Would just love to hear all your thoughts and reactions.

0:22:29.5 DW: Yeah, that’s really interesting. And I think… I’ve seen them all over the place, to be honest. So maybe to start from the smallest possible organization, so it’s fine sometimes to start with something like emailing Excel files around, and you will have an analyst who will analyze those Excel files. And Excel has been great because it solves those problems with one tool. And then as the team starts to grow, you see that indeed you will usually have some kind of data engineer because it gets harder to get in source data. And those will often be in IT teams at first, at least. And then as it grows, you get more analysts and then you get the analytics engineers in between, and those could be… Traditionally, I’ve seen a lot of analytics or more like the BI people at the finance departments, but organizations that come from, let’s say, a more web-oriented approach, they will have their analysts or they will have a lot of analysts in the marketing and web and sales team.

0:23:45.4 DW: And so sometimes, you see that that’s where the analytics and analytics engineering organization tends to grow as well. And this is actually a really interesting point because we’ve also seen companies, organizations where you get kind of two teams growing in different ways and they start competing with each other, right? So you have a BI team that is building out their own BI needs and finance dashboards and that kind of thing. But then, you have a marketing team that has a very strong analytics approach. And sometimes, they’re further ahead, and sometimes it’s the other way around. But they need to find a way to consolidate these different platforms, and that can be very interesting, so to say. But yeah, I think that in general, what you see is they can be in different departments, and in the end, every department will usually have some kind of analytics role for reporting or dashboarding. And for some departments like marketing, like finance, those tend to grow a little bit bigger than other departments. And then you can also… When you go up another level, you can get into the whole data mesh thing where all these departments are so independent of each other but have shared definitions, or at least agree on definitions of their output through data contracts. But that gets very complicated very quickly and I don’t see a lot of organizations really doing that very well.

0:25:21.8 VK: Not quite ready.

0:25:25.1 MK: Would you have seen more centralized or more embedded models? Which would you say the industry is leaning towards, from your experience?

0:25:33.9 DW: I might be a little bit biased because we get brought into centralized teams and build central data platforms. So that is definitely biased. But I do see a lot… I see a mix, really. When I speak to others in the analytics engineering community, there is a lot of decentralized… Like I said, I think a lot of it comes from, let’s say, marketing and web analytics as well, or from BI teams. And in the case of BI teams, they’re either centralized or with the finance organization. So it’s really a mix.

0:26:09.2 MH: And it seems like a lot of companies are growing organically into this as opposed to more top-down planning a lot of times, and so you end up with little spots here and there, it feels like. And it’s fascinating you brought that up, Dumky, about how finance analytics teams or traditional analytics organizations, these newer ones with maybe marketing data, they really have a lot to do for each other, but somehow never met before, in most companies. [chuckle] And it keeps blowing my mind. I’m like, well, you have this whole other analytics org. Why aren’t you talking to them? And they’re like, oh, yeah, well, they just don’t know anything about what we do. And it’s just one of those things that I hope as time goes on, we’ll see that combine or coalesce into more robust analytics organizations that are holistic in their approach to the business. I do understand it to a certain extent because the data sets that people are working with and the methods and things like that are much different on different sides. And so, if you’re a data scientist on the operational side or the finance side, you’re not solving the same problems as a marketing data scientist or a marketing analytics engineer. And so, they’re a little bit different. But yeah, it’s really strange how they’re not well connected usually.

0:27:31.3 DW: This is a really interesting point to me. And I think what I’ve seen… So a lot of us come from a web analytics background. We’re very familiar with more event-driven data or like…

0:27:42.2 MH: Streaming data.

0:27:46.8 DW: Streaming data as well. And interestingly enough, all these data engineers, or not all of them, but a lot of the data engineers are not as familiar with that type of data. And even the BI teams are also not as familiar with that type of data. So I do also see, and this is really a change in the last couple of years, that these different worlds come together in a way where all of a sudden the data engineers also need to quickly learn about how does all this web analytics and event streaming data pipelines work, and how can we integrate it into these more bigger static types of data, and how does that affect the way we view our customer? Because all of a sudden, a customer is no longer just a role with like, “This is my customer and this is their revenue.”

0:28:37.9 MK: With five orders and $100 worth of purchases.

0:28:40.1 DW: With five orders, exactly, yeah. And now you have to think about, okay, so if I have like 2000 events for this customer in the last week, how do I make a meaningful aggregation of that over a certain period of time? And that then requires data modeling that these data engineers are not familiar with. And yeah, so it’s like a big pot, I feel, that’s being stirred at the moment, and the dust really hasn’t settled on how this should be organized.

0:29:06.8 MH: And I think that is sort of a hook into something else we can talk about in this space, which is this transition from traditional ETL to what’s now being called ELT. And maybe you could talk a little bit about the terms around that, as well as why is that transitioning happening?

0:29:24.6 DW: Yeah, exactly. And so, to me, the fundamentals of this is really understanding a little bit about the technology that provides this change. And so back in, let’s say 2012, what you would have is if you wanted to bring in large amounts of data, you would need some kind of distributed system for transferring that data from place A to B. So you’d have to split up that data in different chunks and different computers, different servers. Basically, you would be able to move that data into your data warehouse and your data warehouse would be oriented row by row. So, your one million transactions would go in row by row. What we’re currently seeing is that there’s been two really big changes and the first one is that we went from these row-based storage systems to column-based storage systems. And why is that important? Well, the important thing about that is that when you physically put stuff together on your hard drive… So you can imagine a row with a transaction ID, a transaction amount, and a lot of details about that transaction.

0:30:44.1 DW: If you put that block by block, you can see all these rows after each other on your hard drive. If you want to aggregate those transactions, so let’s say we want to have the sum of… Or the total transaction value for our customer or for all our customers or for all our customers in a specific segment, you would have to skip lots of columns in between. So, what the bigger analytics data warehouses decided to do really is to create a columnar format where you have all the values of that specific column together. And so all of a sudden, your hard drive is just very fast at going from the first transaction to the last transaction, because they’re all together in the same space. And so that paved the way for BigQuery, for Snowflake to make use of that. Of course, then you get all these data warehouses in the cloud, so to say. So they’re way more accessible for a lot of customers.

0:31:45.6 DW: And the last trend that we’re seeing with a database like DuckDB, not sure if you heard about it before, it’s not super important, but what they’re trying to do is to say like, “Hey, we used to have computers that were not as powerful as we needed them, but we’ve abstracted away the whole computer layer and right now computers are basically so powerful that you can have one server with a giant amount of memory, a giant amount of processing power, and it’s more than enough to accommodate for like 95% of all analytics use cases.” And if you have like a top 5% analytics use cases, you’re probably Google or Netflix or Meta, and you have the engineering capacity in-house to facilitate that. So I think we hopefully see a little bit of a trend where things get a little bit simpler because you don’t have to think about how to distribute your workloads.

0:32:56.1 DW: And I’ve done this in the past with tools like Databricks where you need to actively think about, is the analytics function or computation that I want to do here, is it more memory intensive or more compute intensive? And how do I make sure that my cluster scales up perfectly for that and I don’t spend too much money on it? And those are all very low level types of thought processes that I don’t necessarily want to be bothered with, because I think a lot of the good tools out there just abstract that away. So hopefully, that gets a little bit easier. And so that’s been, yeah, basically the three trends that I’ve seen in the past, like these columnar databases, a move away from distributed systems. I’m blanking out… [chuckle]

0:33:49.5 MK: And so that has then shifted though, the order in which we ETL or ELT, because of the different structure of…

0:34:00.4 DW: Yeah, exactly. Yeah, so that was the original point, right? Yeah, so now what you can do, because you were restricted on that initial load of your data, you had to think about what types of transformations do I want to apply, because I don’t have the capacity to duplicate my entire source system. Well, it turns out that right now, we do have that capacity and so you don’t need to think about all the use cases you have in advance. You can just bring in all that data. It’s still good practice to think about, like, don’t bring in too much, you want to prune things a little bit. But you can have a non-destructive mechanism on top. So you keep that original data and then you build your transformations on top of that. So that’s why you do the extraction and the loading first, and then the transformations. Because then when someone says, “Hey, actually, we made an error here, or we want a different type of transformation,” you can just go back in and say, “Let me adjust that for you,” and you run your SQL code again and it just uses the same data, but it builds the new system with the new transformation needed.

0:35:10.9 MK: And the funny thing was, the thing that has been rolling around in my mind for like the last period of time is about privacy and governance. And I’m like, wait, why does it sit with the analytics engineer? And I don’t feel like I ever had an answer that clicked in my mind. And I feel like now, reading your book and also understanding the difference between… I knew that it was ETL and now it was ELT… No, wait, it was ETL and now it’s ELT. There we go. But I couldn’t articulate and understand why. And I didn’t understand why that would then mean that analytics engineers would be managing a lot of that privacy and governance piece, which I think was a really interesting takeaway for me. Here I go again without an actual question.

[laughter]

0:35:55.0 VK: I mean, the number of visualizations that I saw talking about the differences between the two, and my brain being like, “Okay.” But yeah, that made so much sense, so I appreciate that description and that walk through. [chuckle]

0:36:09.8 DW: I’m glad. But yeah, to your point, Moe, about the privacy part, I think it’s not necessarily the responsibility of the analytics engineering team to take care of privacy, but they are the ones that can identify where privacy issues might arise. So what we do a lot is just have… Sit down with a data owner who is… Usually, that’s the team that provides the data or is responsible for inputting data if it’s, for example, a sales system. And then we can say, hey, actually, these here are email addresses or they’re home addresses for people, so we need to apply some kind of masking strategy so that the analysts from other departments or other teams will not be able to see those values. And we still might want to be able to use them for certain computations, so our underlying system needs to be able to see them. But the analysts within a specific group or a specific team are not supposed to see them. And so that is a very technical challenge in that sense and also requires a process where there is one person or a role that identifies this specific field in the database and assigns a value to it to say like, “Hey, this should be masked,” or “This should be deleted after x amount of time,” for example.

0:37:39.0 VK: Super interesting. The one thing that’s in the back of my mind, how people kind of find themselves into this role, ’cause I saw you wrote somewhere about how so many people from the analytics field, regardless if it’s on the data engineering side or analyst side, that we really pride ourselves of being self-taught in a lot of ways and learning on the job and reading those blog posts. And so I think you said something about, this is the book that you could have used five years ago to help give you some of those hard skills or explain some of those concepts. But I’m curious your thoughts on just that and the role that this book can play to help people who are interested, but also, I’m curious… My second loaded question here is, do you see a lot of people coming into the analytics engineering role from data engineering roles or more from the data analyst side? I’m curious which one is kind of merging, or if it’s a completely different background or industry all together.

0:38:31.6 DW: Yeah, it’s a really good question. I think what I see is that a lot of people already have this type of role, they’re just not aware that there’s a name for it yet, so that’s the first group of people. And I have quite a few people who come up to me, even before we organized analytics engineering meet-ups here in Amsterdam, and so people would come up and say, “Hey, this is really interesting. I’ve been doing this, I just didn’t know there was a name for it.” But having a name for it makes it easier to find resources online, find other people. So that’s been one part of it, which I think is really great. You create this kind of community around it. And then you also have, indeed from both sides, from data engineering or analysts, you will have people that say, “Actually, I like this more technical stuff as an analyst, so I’m gonna go in that direction.” Or, as a data engineer who says, “I kinda missed that connection with the business, with talking to stakeholders.” So I see that quite a lot as well. I think in general, it’s easier for people to say, “Hey, I like talking to stakeholders and I’m gonna add… Brush up on my technical skills.” But this also depends really, right? So some of the data engineers are more traditional software engineers where you work in big teams and you’re just comfortable in your team and having assigned tasks for you.

0:40:09.8 DW: But I also see a lot of data engineers that go in the other direction and say like, “Hey, I just enjoy the best of both worlds.” So to me, it’s more of a… Maybe that’s my final answer to your question, is it’s more of a role that doesn’t necessarily apply to one single person. So one person in an organization can be, let’s say, both a data engineer and an analytics engineer as in that role. And you can kind of play around with the terminology there. And so it helps to shape a role and you can, as a person, identify if that is the type of work that you wanna do. And I hope for people, it will give them direction to say, “Hey, this is the type of work that I wanna do. And now, I have a guide to understand which direction I need to take, what kind of skill set I need to add to my skills.”

0:41:06.0 MK: And Val, in my team… Well, at Canva in general, we’ve had lots of data scientists who have started, realized that our data scientists actually do a lot of analytics engineering. Now that we have analytics engineer, they tend to do layers further down, more like model and report layers and less transform and source. But we’ve had quite a few of our data scientists who really enjoy the work in the data warehouse, who really enjoy data modeling, who then transitioned from data science into analytics engineering. It’s been really interesting to see it build out as well, as a function. Actually, now, selfishly, that is a great question I should ask on behalf of some of the analytics engineers at my work. Dumky, it does sound like you have a great community where you are in Amsterdam. But one of the things that I have found that is really difficult is professional development, because the industry… Well, the role itself, what was it? Like five years ago, six years ago that it kind of became a thing. And so we have really struggled with… Everyone’s kind of progressing at the same level. I mean, there is obviously lessons we can take from data engineering or analytics or whatever, or traditional software engineering. But I’ve definitely struggled with how to help people with their professional development in a space that is quite new. What advice do you have?

0:42:30.2 DW: Yeah, it’s… I mean, of course, I can say buy the book.

0:42:33.3 MK: They should definitely buy the book. Yeah.

0:42:37.9 DW: That’s kind of cheating. But there is… So there’s a couple of areas. And this, again, it depends on the profile that you already have. Maybe you know Python, for example, but you don’t know SQL, or you know Python and SQL, but you’re not as good with managing stakeholders or setting up workshops with your stakeholders. So it depends a little bit on what you as a person have already in terms of skill set. In general, I think what we look for, for example, when we hire new analytics engineers in our team is we want them to have an idea of what software best practices are. So, that’s like version control and testing your code. We want them to have an idea of what data modeling means and how to apply that in SQL.

0:43:35.4 DW: So, have an understanding of what a star schema is, what dimension and fact tables are, maybe some other types of data modeling. If you wanna really show off, then we look at what is your experience with cloud computing, because I think… It’s not for everyone, but it is good to have an idea of, where does my code actually run? Where do my transformations run, and what does it mean to do that in a data warehouse like BigQuery or Snowflake? How do I set that up? How do I manage that in terms of costs and permissions? And then on the other side, it’s more like consulting skills. So, are you able to ask the right questions? Are you able to really work with your stakeholders to summarize their needs and requirements? And are you entrepreneurial enough to set that up yourself and manage your way around an organization? So that’s the five pillars that we use for assessing analytics engineers.

0:44:37.9 VK: That’s a great summary. I think I saw that each of the authors had different chapters that you were responsible for in the book, depending on your expertise and your areas of interest. So I’m curious, which ones were yours or which ones were you super excited to write about? And a part of me wonders, were you writing this book as a letter to your former self of, [laughter] these are the things that you’re gonna fall in love with, with this role? But I’m just curious, what parts of the role excite you the most?

0:45:08.2 DW: Yeah, that’s an interesting one. And so, to me, I think what was really great about having a different set of authors is that… What I already said. It’s almost too hard to do this as a single person. So, especially if you wanna have the latest and best of everything, you need someone with a specialty in, let’s say, BI or dashboarding or the history of data warehousing or that kind of stuff. And so in that sense, it was really great for us to come together as a team and have discussions about the structure of the book and how to do that. And so for me personally, I wrote the chapter on data ingestion and on data quality and observability. I think especially for me, data ingestion has been really interesting because I’ve… For the last 10 years, I think I did my job pretty well, but I’ve always struggled to understand why do we need to move data from point A to point B in the first place, and what’s happening in between? And so this is kind of the culmination of that thought process of what happens there and what you need to think about to make that work. Because, so often, I’ve done that. I was like, yeah, it makes sense now, but then a week later, it breaks or something misses. And this is almost like my personal guide for myself to have a checklist and say, “If I go through these steps, I can make sure that I have a fail-safe data ingestion pipeline to move my data around.”

0:46:50.4 MH: Yeah.

0:46:51.7 VK: Nice, nice.

0:46:52.0 MH: Your data ingestion pipeline is always throwing off errors, whether you’re looking at them or not.

0:46:56.8 DW: Exactly.

0:46:57.5 MH: This is great. All right. Well, this… Oh, go ahead, Moe.

0:47:00.5 MK: I have one last question.

0:47:00.4 MH: [0:47:00.5] ____ and I want you to ask it.

[laughter]

0:47:06.6 MK: Just because it is something that I think a lot about, and to be honest, I think about this trade-off for many areas of data work, including machine learning models and all sorts of stuff. But I guess, how do you know, as an analytics engineer, when you get the balance right between the business logic requirements, the optimization to run faster, especially when compute keeps getting cheaper and cheaper? You might not have as much drive to go back and optimize old code, and also the time to build and deploy. Is there just a spidey sense you have? Or how do you feel like you know when you’ve got it right?

0:47:47.4 DW: Yeah, that’s really hard, always. I think the key point is to understand where the business value lies. If you have an idea of whether something is going to add business value or not, that’s your first starting point. That being said, I do like the idea which we apply quite a lot, is to have a dedicated percentage of your work for eliminating tech debt, basically. So thinking about, there’s always gonna be a little bit of stress. There’s always gonna be deadlines which you will need to meet. But you can account for the tech debt that you build up by saying, “Hey, I’m okay with doing this now. But then, over the next two, three months, I’m gonna slowly work on turning this into a more generalized approach so we can also use it in other places.” Or, “I’m going to clean up these tables that we’re not using anymore, and at the same time, create a process around that.” So yeah, I do feel as a team, it’s good to make a conscious decision about, you can’t allocate 100% of your time to building out use cases. You have to account for some kind of percentage, basically, of cleanup time. Yeah.

0:49:10.2 MK: I like that.

0:49:10.4 MH: Yeah, that’s a really good thing to bring up, because I find a lot of organizations really struggle with the balance of that. And it’s hard as an analytics engineer to get people to understand that work, because they’re sort of like, “Well, you’re not creating any new reports, or you’re not creating any new insights or anything. So what are you doing?” And it’s like, “Well, I’m making it run. I’m making it keep running, okay?”

0:49:36.9 MK: Well, so, tip for young AEs out there, I was helping one of our AEs go through a promotion application, and they were kind of like, “Oh, I’ve got nothing to put on there.” And I was like, “Are you mad? You are responsible for these models that drive tens of millions of dollars of, I don’t know, marketing budget or revenue or whatever for the business that have not had any issues come up, or they have been totally reliable, or they’re used to feed this, I don’t know, marketing program that’s then retargeting users. You are responsible for that being able to happen because of what you do as an AE.” And they were like, “Oh, I never really thought about my impact that way.” And I’m like, “Yes, tens of millions of dollars right there. Write it down on your application.” [chuckle]

0:50:25.4 MH: Yeah. All right. Well, this is good. We could talk about this for a while ’cause there’s so much to this. And it’s crazy how much this has blown up in the past few years, but this has been a really good start to the conversation. And thank you, Dumky, so much for joining us to talk about it. It’s been really good. All right. Well, one thing we like to do is go around and share a last call, something that might be of interest to our audience. And so, Dumky, you’re our guest, do you have a last call you’d like to share?

0:50:54.8 DW: Thanks, yeah. Actually, last week, I was reading an article by Gergely Orosz, if I pronounced his name correctly. It’s called The Pragmatic Engineer, on the Internet. And the article is actually a revisit of something he’s done before, which is called the Trimodal Nature of Tech Compensation. And what he does is he looks at, how are people in tech compensated? What are their salaries? And why does there seem to be this disconnect between some companies paying a lot and some companies saying, “Where does this come from? I’m not paying that.” And so what he identifies, or what… His original theory was, there’s local companies that have a local market maybe in their own language that they wanna address. And then there’s the kind of super A players in that market that can pay a lot more. And then, you have these global companies like Google and Netflix and Meta that bump up the market and also the hiring prices for other companies, basically. And so now, he’s gone back and collected data since his first article. And so he has this great analysis on how… Because he originally did it in Amsterdam, and now he’s looking at other places as well and sees that it holds in all kinds of cities across the world.

0:52:20.6 VK: Very cool.

0:52:21.8 MH: Very cool. All right, Val, what about you?

0:52:25.2 VK: So, no one’s surprised. Another Medium article published on the UX Collective, which is just… It’s my favorite email to open every week. This one is talking about emerging UX patterns in generative AI experiences. And it’s a long one, but there’s so many awesome visuals in the way that they break things down. And it’s talking about the historical context of back… Command line interface and where we are today, and different graphical user interfaces and the experiences we’ve become accustomed to. But some of the things that they break down is a lot of AI tools say that they’re conversational, but they’re basically saying, no, you’re not, this is not conversational yet. But it’s talking about some places things might go, which I think is really interesting. And another one, which I hadn’t really thought about, but after I read it, I see it so many times, is so many AI features that software tools are adding is really just combining a lot of the features that already existed. [laughter] So it’s doing things faster, but you lose control, like some of the finite control of each of those individual features. But that’s one of the ways that they’re trying to assist to be working alongside you to do your job faster. But anyways, the breakdown of some of those systems and the comparison to legacy and predictions of the future, it was just a really well done piece.

0:53:47.3 MH: Nice. All right. And Moe, what’s your last call?

0:53:52.0 MK: Okay. So I stumbled upon this article on FS blog about first principles through our engineering handbook. I was deep in something to do with our engineering values. It was a whole thing. Anyway, first principles is generally a concept that people are pretty used to. It’s about reasoning and removing assumptions and conventions and that sort of thing. But that’s not the bit that I actually found interesting about the article. The interesting bit about the article, it was talking about the coach and the play stealer. So basically, a coach… It comes from an anecdote in the article about, not everyone who’s a coach is really a coach. Some people who are a coach are play stealers. So the coach who is really a coach is the one that creates new plays, that really deeply understands the game, understands their team, how to shuffle things around to get the outcome they want.

0:54:48.5 MK: Then, there are the coaches that are play stealers, that just be like, “Oh, that other team did that thing and it worked, and now I’m gonna try it out.” This is not to throw shade on either ’cause I’m definitely a play stealer, not a coach. But the bit that really blew me away was it talked then about, when a play isn’t working, the play stealer can’t figure out why it’s not working because they don’t know why it was created, whereas the coach can. And I guess in my mind, I’m actually just thinking about this in terms of the team and the different people that you have on your team. And I guess the takeaway is you need to have enough coaches on your team and not just play stealers. Otherwise, everything kind of becomes derivative. And there’s a lot of arguments in tech that that’s kind of the thing where everyone’s just kind of moving between the different tech companies, ultimately taking their playbook with them from their previous company. And it just kind of got me thinking. And it also made me really appreciate the people on my team that are the coaches, that come out with those really crazy, audacious plays and are like, “We’re going to try this crazy new tech or this bonkers idea over here.” And I’m like, “Yes, let’s do it!” But I still don’t understand why. So anyway, it was just a really interesting read. And it just made me self-reflect a bit and reflect on the team, and that was really nice. So, yes.

0:56:08.7 MH: All right. Well, my last call is from Cedric Chin, who was on the show a couple of months ago. We talked to him about Becoming Data Driven, From First Principles. That was the article that we talked about. But he’s now finished that series, which was about The Amazon Weekly Business Review, and posted that article, which is an excellent read, as most of his stuff is. And so I highly encourage you, if you’ve been following that thread at all, to go read those articles, and especially now the capstone article there over on commoncog.com.

0:56:40.0 MK: I’ve got two more things.

0:56:42.4 MH: Yes, go ahead, Moe. Sorry.

0:56:44.7 MK: I know, but I’m completely breaking the rules ’cause I didn’t even do the two at my time.

0:56:50.6 MH: It’s okay.

0:56:51.6 MK: Firstly, I really do want to do a shoutout to the book that Dumky co-authored, Fundamentals of Analytics Engineering: An Introduction to Building End-to-end Analytics Solutions, because it really is a terrific read. And we got to see a couple of things that weren’t in the book, and I’m still like, “Oh, that should have been in there. That was so good.” [chuckle] But yeah, thoroughly enjoyed it. And the other thing I…

0:57:10.0 DW: Thanks.

0:57:12.0 MK: Wanted to quickly do a shoutout about is MeasureCamp. It is happening in October in Sydney, which is just a couple of months away. It’ll be Saturday, October 26 in Sydney, if you are ANZ based.

0:57:27.2 MH: Awesome. Yeah, a lot of MeasureCamp going on, [chuckle] ’cause we’ll be at one in September, in Chicago. So, all right. Well, this has been excellent. Dumky, thank you so much. Thanks for taking the time to come on the show. Really appreciate you sharing your experience and expertise with us today.

0:57:43.5 DW: Yeah, thanks. It’s been a great pleasure and it’s been nice as a long-time listener, first-time guest here.

0:57:51.2 MH: Oh, you see how the [chuckle] sausage is made, so to speak. So, yeah, [laughter] nothing to it. All right. And of course, no show would be complete without a huge shoutout to Josh Crowhurst, our producer. We really appreciate him. And as you’ve been listening, you may have been thinking, “Oh, I’d like to learn more, or I’d like to read more about this.” We would love to hear from you. Please, reach out to us. Well, you can reach us on our LinkedIn page or via email or on the Measure Slack chat group, which you mentioned, Dumky, which we’re also very active on. So, love to hear from you. And that’s a great community for sharing things together. All right. Well, let’s wrap this up. But I know that as you’re going out there and you’re grappling with things that are happening in the data and analytics space, and you’re figuring out what to do, I know I speak for both of my co-hosts, whether you’re a data engineer, a data scientist or an analytics engineer, or an analyst, remember, keep analyzing.

0:58:53.6 TW: Thanks for listening. Let’s keep the conversation going with your comments, suggestions and questions on Twitter at @analyticshour, on the web at analyticshour.io, our LinkedIn group and the Measure Chat Slack group. Music for the podcast by Josh Crowhurst.

[music]

2 Responses

  1. rui wang says:

    I’m a one-person data team at my company and your podcast really helped me a ton! Thank you!

Leave a Reply



This site uses Akismet to reduce spam. Learn how your comment data is processed.

Have an Idea for an Upcoming Episode?

Recent Episodes

#254: Is Your Use of Benchmarks Above Average? with Eric Sandosham

#254: Is Your Use of Benchmarks Above Average? with Eric Sandosham

https://media.blubrry.com/the_digital_analytics_power/traffic.libsyn.com/analyticshour/APH_-_Episode_254_-_Is_Your_Use_of_Benchmarks_Above_Average_with_Eric_Sandosham.mp3Podcast: Download | EmbedSubscribe: RSSTweetShareShareEmail0 Shares