#133: Server-Side vs. Client-Side Tracking with Mike Robins

Once upon a time, website behavioral data was extracted from the log files of web servers. That data was messy to work with and missing some information that analysts really wanted. This was the OG “server-side” data collection. Then, the JavaScript page tag arrived on the scene, and the data became richer and cleaner and easier to implement. That data was collected by tags firing in the user’s browser (which was called “client-side” data collection). But then ad blockers and browser quirks and cross-device behavior turned out to introduce pockets of unreliability into THAT data. And now here we are. What was old is now somewhat new again, and there is a lot to be unpacked with the ins and outs and tradeoffs of client-side vs. server-side data collection. On this episode, Mike Robins from Poplin Data joined the gang to explore the topic from various angles.

Tools and Tips from the Show

Episode Transcript

[music]

00:04 Announcer: Welcome to the Digital Analytics Power Hour. Tim, Michael, Moe and the occasional guest discussing digital analytics issues of the day. Find them on Facebook at facebook.com/analyticshour and their website analyticshour.io. And now, the Digital Analytics Power Hour.

[music]

00:24 Michael Helbling: Hi, everyone. Welcome to the Digital Analytics Power Hour. This is episode 133. In the earliest days of analytics, we used the log files generated by the web server as the underlying data set for reporting. As time went on, we were able to get more data and interesting data using JavaScript. And by 2005, it became pretty much the norm for most companies to use that method. Well, here we are in 2020, and with page speed being the ever-banging drum on this Viking ship we call the internet, we’re going back to server-side technology, but this time with APIs, calling each other up and sending information without adding code and weight to the page that the visitor is on. So Moe, are you more of an IIS or Apache log connoisseur?

01:18 Moe Kiss: That’s just mean.

[laughter]

01:20 MK: That’s just such a brutal way to start.

01:24 MH: I just want to, I… What that question is for is not to make you feel less than, Moe, but to recognize that we’ve come from a long way. And actually, a lot of people are grappling with this for the first time in lots of different ways, so it’s just good to know a lot of… A different spectrum. Tim, you excited to re-litigate what was a fun part of the Yahoo forum of old?

01:46 Tim Wilson: I’m looking forward to it. It was different ’cause the first thing you used to have to do with the weblogs was basically strip out all of the image request, like 95% of the stuff you had to have all your rules to just rip it out. And if nothing else, the server-side…

02:00 MH: Hits.

02:01 TW: Yeah.

02:01 MH: How many hits did we get?

02:03 TW: How many hits and people intuitively meant hits to the webpage, but then the analyst would be like, “Well, that one webpage generated 47 hits.”

02:12 MH: Well, since I added 37 spinning flaming skull GIFs to the bottom of every page, we get lots of hits.

02:18 TW: In our hit counter.

02:20 MH: I’m Michael Helbling and I know how to raise your hits. Alright, well obviously we needed a guest, someone to help us sort out the what’s what when it comes to server-side versus client-side tracking in data collection. Mike Robins is the CTO and co-founder of Poplin Data, and he’s held senior technical and data science roles at the University of Sydney and Bauer Media as well as other companies, and today he is our guest. Welcome to the show, Mike.

02:47 Mike Robins: Thank you very much indeed for having me.

02:49 MH: Alright, so why are we back doing server-side analytics? What’s happening in our industry that’s driving this?

02:58 MR: Well, I don’t think we ever stopped. I think we hit a point where we went, “Okay, let’s do everything client-side,” but we’ve never really stopped doing anything server-side. So my… When I first started in web analytics in a very informal capacity, the first kind of tooling that I think a lot of people used was server-side analytics. It was log file analysis, it was stripping out images, it was stripping out CSS files. So one of the first tools I used was AWStats. And this year, that’s gonna become a 20-year-old tool. It’s still around apparently, I did a quick Google and it’s still being released. And that really was the state of the art. It was generating your bar graphs, it was generating your hits, it was generating beautiful multi-colored, three-dimensional pie charts. And I don’t think that, I don’t think that world’s ever left. So I think we’re still there, I think we’ve been biased a little bit by the prevalence of JavaScript and going, “Look, we can do everything in JavaScript now, we don’t need to do this complicated server-side code,” but I’m not convinced that we’ve ever left that world. The pendulum’s just swinging back in the opposite direction.

04:00 TW: I feel like there are plenty of people who entered… I mean there definitely were AWStats users, and there may have been people who were cranking away on some of that, but I think there’s a huge portion of people who are full-time digital analysts today who arrived in the industry and immediately were learning about client-side, JavaScript, and tagging and they sorta know…

04:21 MK: Look, look, here’s me over here, one of those exact use cases.

[laughter]

04:28 TW: So I think saying that it didn’t go away, it doesn’t mean that the technology wasn’t still there, and there weren’t users who were on it, but there was so much excitement of all these wonderful things that could be done with page tags, and I don’t think at the time any of us really stopped and realized that client-side tagging was a big ass hack to the internet, right? It was a clever hack, and it offered so much upside that it never really occurred to anybody that, “You know what? The fact that we’re kinda piggy-backing on the way the internet works means this could come up and bite us in the ass at some point 10 or 15 years hence,” I think.

05:09 MR: Yeah, I think it was that explosion of novelty and I think a lot of that was driven by people seeing JavaScript for the first time. Everyone who would come from these static HTML pages with your rotating GIFs, and your Geocities websites; and all of a sudden, you could do all this amazing stuff with Javascript. You could move people’s mouths, you could have a little pop-up, you could do all this amazing stuff. And I think part of what fell out of that was, “Okay, now this provides an opportunity to do analytics in a very different way to the way that it’s been done traditionally.”

05:40 TW: So why are we going away from that wonderful…

[laughter]

05:43 S1: Yeah. No, I mean, that’s where I’m at. Everyone seems to be talking about server-side, so what is driving that?

05:51 MR: I think there’s a lot of different answers to that ’cause I think the main thing driving it at the moment is probably not the best reason, but I think it’s one of the biggest things driving it, which is the way that ad-tech has kind of flourished, and the way that things like ITP, intelligent tracking protection, and enhanced tracking protection have basically come; I think that seems to be one of the biggest drivers. It’s not necessarily the right driver, but people are now looking for an alternative to go, “Okay, all this stuff we’ve been doing on the JavaScript traditionally, whether it be in mar-tech or ad-tech has worked brilliantly up until now.” All the sudden, the notion towards privacy is starting to change, and we’re becoming more and more considerate of, “What about the performance of the browser, what about the performance of what the client’s device actually is, how do we move some of that performance to the server instead?” But I think fundamentally, it’s still being driven by privacy and the fact that some of the things that were working fine in the past have suddenly stopped working and there’s lots of people who aren’t happy about that.

06:46 MH: Yeah, and I would tack on an additional reason that happens, maybe it’s just more common for it to happen here in the States, but SEO is a big reason why companies are concerned about this.

06:57 TW: For performance?

07:00 MH: Yes.

07:00 TW: Okay.

07:00 MH: Because the speed of your page is a huge ranking factor in SEO and there’s all these studies that people have done that said, “How fast your page load also impacts your conversion rate.” So people really believe that, “If I load my page extremely fast, then my conversion rates will be higher, I will rank higher in Google.” And so, IT departments everywhere have these metrics of, “We have to get the page speed below this rate,” or speed to first, whatever it’s called, display or whatever. I don’t know all the right words, like how the browser breaks everything down, but basically those all have these numbers associated with them. And so suddenly, we come in and we’re like, “Hey, we’d like to load up another pixel and remarketing tool on our page.” And IT already hates marketing anyway, and marketing already hates IT, and so basically, there’s a fight. And basically, being able to take this and put it on the server side is a great resolution step, right? So that way they don’t have to add weight to the page and make it take longer for the page to show up. It helps our SEO, it helps our page speed metrics that we get a bonus on and everybody can go home happy.

08:09 MK: But, okay, so what are the… ‘Cause I’m… I’ve had this conversation with our frontend engineers who, in a long thread, were basically like, “We’re ripping off all of the client-side tracking.” And I obviously was like, “Ah, what the hell are you doing? That’s not cool.” So what are the arguments then for keeping it? What can’t server-side do?

08:30 MR: Look, I think a lot of the arguments for keeping it are things that you can… Often, things that you can only get uniquely from the browser or the device, and they have to be sort of client-side. And typically when people talk about client-side, you’re talking about the JavaScript API, the JavaScript API that runs within the browser. But just as easily when you talk about client-side, you can be talking about a native app that runs on a device, you could be talking about IoT device in agriculture or something like that, and you’re also talking about things like wearables as well. So often cases, there’s information that you might wanna know, so it could be something really basic like the size of the browser, or the size of the viewport or the position of the cursor of the mouse, for example; and you’re not gonna be able to get that information server-side, it wouldn’t really make sense to get some of that information server-side. So there’s a lot of information you can get on the client that isn’t available to the server, but there’s also nothing to stop you sending that information from the client to the server as well, which is what most tools do when they use the JavaScript API.

09:28 TW: So, quick back-up just a little bit to the SEO, the driver. I’m curious how often… So many things, historically, in our industry are… Something gets out as, “This is an impact and here’s the solution,” and people jump straight to it. It’s a fact that running server-side is going to take less, it will positively impact the load time as opposed to running client-side. But in practice, the investment to maybe switch to server-side and the tradeoffs on what might be available in data and the actual impact on page load performance combined with how that actually tracks down to SEO, has that been quantified? It feels like the way you articulated that…

10:14 MK: We’ve just done a test on it, which is intentionally… It’s still using client-side, not server-side, but we did wanna understand how the speed impacted conversion rate, which you might think, “We’re terribly shit human beings,” but we intentionally increased the delay for the variant to see how it did impact performance. And that’s not like about SEO performance, that’s actually about the user conversion rate, but it was something we wanted to understand. Because in one case, for client-side, we’re having lots of issues where our pages… Our redirects load too fast and therefore, it’s missing some of our event tracking. And if someone’s like, “Let’s just slow the page down,” it’s like, “Well, that’s not very good for our users,” so why don’t we actually test some of these ideas and see how they perform.

11:04 TW: That seems like the great way to answer that question. I just, I will go on record as being a little skeptical that you can take that A is true, B is true, C is true. Therefore, A plus B plus C means we have to do D and that… I don’t know that that has really been… Or even looking at organizations that have a slow website and like, “We’re gonna switch from client-side to server-side.” But the reality is, “Have they first done the assessment of what’s really driving the slow speed?” There’s the opportunity to rearrange deck chairs on the Titanic that this is a fix that is very in vogue, but it may be that gee, the Lotus Notes backend that you’re running to serve your content might… Much more expensive to address. So sorry, I just… A backtrack ’cause my wheels were spinning.

11:51 MH: Well, it is always funny to run into those folks who are like, “We can’t put this JavaScript on our page,” when all their images are uncompressed or something like that. You’re like, “Really?”

[laughter]

12:03 MH: Yeah, maybe you should look at that thing.

12:04 MR: Yeah, I think there’s a non-sequitur argument you can make there which is often analytics can be really, really heavy, but it’s often not the heaviest thing on pages, it’s, “How is the server serving response? Do you have compressed images? Are you serving third-party CSS or JavaScript or something else from somebody else’s CDN that could be on the other side of the world?” So I think analytics has a part in that, but whether or not it’s the number one player is… Look, I agree, I think sometimes it’s rearranging deck chairs when really it should be a priority, but in the list of priorities, it’s probably not number one.

12:20 MH: Yeah, but IT hates marketing. And they’ve asked for all these pixels, so that’s why it gets on the hit list.

12:20 TW: Which it could be.

12:20 MK: Yeah, but that’s the thing that pisses me off, is those pixels, in our particular case, are driving the revenue of the company which is paying your salary. So just let it go, let it go. It’s a necessary evil.

12:20 TW: That’s a really effective argument, that’s a really effective negotiating technique between… Through organizations that have historically battled each other for…

12:20 MK: Wait, are you being sarcastic? I feel like…

[chuckle]

12:20 MH: Yeah. I don’t…

12:20 MK: Okay, how would you do it? You tell me what I should be saying.

12:20 MH: Moe, I don’t think it’s in the scope of this podcast episode, trying to tackle this problem.

[laughter]

13:20 MH: We’re just here to discuss the underlying technologies…

13:24 TW: Well… ‘Cause the reality is, is they are… I recently found a link, I think, I actually… At one point, it was actually a while back. It was like variety… There were so many… IT often has a legitimate case that there’s such piss-poor governance of the pixels that it really is cratering performance. And then again, it becomes…

13:43 MH: Sure.

13:43 TW: It’s easier for them to say, “Binary marketing and pixels are bad.” And it’s like, “Well no, but maybe the 30% that ought to be cleaned up or that are particularly you’re responsible.

13:52 MH: And the real bugaboo is JavaScript that’s render-blocking, right? So nothing else can go until this JavaScript is done. And so historically, there were a lot more of those. I think as an industry, we’ve seen a lot of that clean up and we’ve gotten better about not letting that happen. But five years ago, that was pretty common, a pixel from some random third party would be just dragging the page along. And it was just like, “What on earth, why am I waiting for statsblahblah.com to load?” And you’re like, “Oh, that’s our media server.” And you’re like, “Well, your media server is gone, we can’t have that.” But AB testing tools, still a pretty common use case for render-blocking JavaScript. There’s ways to get around it, but it’s pretty complicated. But there are some AB testing vendors that actually will let you set up tests server-side, so there’s definitely ways to work around it.

14:42 TW: Isn’t SiteSpect one of ’em that’s primarily server-side?

14:44 MH: Yes, historically. Yeah, they’ve got… I think they’ve now updated to both ways. But yeah, they originally many, many years ago, they were the kinda primary server-side AB testing vendor.

14:55 TW: So can I throw in another reason that… And I just recently ran across this, that in reading some… Doing some of the research on this, and Mike may be able to weigh in. I can remember, I had a client where it just happened that we had backend system data that actually included parsed Adobe data feed stuff kinda stitched to it. And with that, just incidentally, we were able to see that about 7% of the revenue for a one-month period was… No Adobe analytics data was captured at all, and this was Adobe deployed client-side. And I didn’t initially get what was going on and that was, the people who’d supply the data were like, “Oh well, yeah, we know that happens. Those are cases where the Adobe pixel is being blocked for one reason or another,” right? “I think Ghostery I think will by default block it.” But they’re like, “Our backend system is still capturing it.”

15:46 TW: So they weren’t an official server-side implementation, they just had their systems stitched together, so they could say, “This was a transaction clearly placed on the website, the transaction went through.” So I think that’s kind of an analog to… I mean, that is a server-recorded thing just not for the purposes of analytics. And so we could actually point directly to what the gap is in revenue that Adobe where… And that traffic wasn’t being tracked at all. So it’s that another case where, if you’re firing stuff server-side, you’re capturing more of it? It’s kind of, maybe that’s a cousin to the blocking and privacy. But if people are interacting with the website, in order to get the web page to return the content to them, that sorta data can be captured server-side because it’s happening off of the client’s browser.

16:38 MR: Yeah, absolutely. It’s a very common use case for transactional data in e-commerce particularly is people do try and do it on both sides of the equation. So you have the client-side activity, and you’ve got the server-side stuff as well. And depending on how you’ve implemented it, it can be very difficult to marry up those logs or it can be quite easy, depending on how that’s being implemented on both ends.

17:00 TW: Can you describe the very easy versus very difficult? I can imagine the very difficult, very easily. [chuckle] I can’t imagine the very easy.

17:07 MR: Sure. So the very difficult is you’ve got a set of server logs sitting there, and they might have a user ID associated with them and a timestamp associated with them. And then you’ve got your client-side stuff there, and that’s also got a user ID and a timestamp associated with them. So if you wanna do a very naive kinda stitch, you can stitch on those two dimensions and go, “Look, these events roughly look like they belong to the same user and they look like they fit into the same timeframe.” That’s pretty difficult. It becomes even more difficult when you start to consider things like, “Well, I’ve got a standard website and there’s multiple products on there. And at any one stage, I know that a user may have a mobile device with my page open, and they may have a computer device with my page open, and they may even have multiple tabs open.” So it becomes quite difficult even with timestamps to see exactly what the user is doing.

17:54 MR: The easier way of doing that tends to be what a lot of systems do, particularly when measuring page performance incidentally is to send some kind of transaction ID. So in the client-side, whether it’s JavaScript or something else, you send a transaction ID that’s unique to that session or unique to that page request, and then you’ve also got a record of that server-side as well. So rather than stitching on user ID and timestamps, you stitch on essentially this shared transaction ID between the two systems. So generally, you always really wanna have one joint key there and you want it to be as specific as possible and the easiest way to do that is to have something that’s generated either client-side or server-side and then synced between the two.

18:34 MK: Okay, so this is another time when Moe uses the show to get free advice. So what about the predicament which, I don’t know, may be facing, it’s hypothetical, of course. So once upon a time, I actually talked on this show about a whole bunch of bot traffic, I was getting from Boardman Oregon, which actually turns out is not bot traffic, it’s traffic, which is coming from segment and the reason that it looks like bot traffic, well it actually doesn’t when you dig into it, ’cause it’s got like a real user stuff, is basically that it is back end triggered events. So, basically triggered by the system, not by the user, so it doesn’t have any user context such as their IP or user agent or all that sort of stuff. And so, a lot of analytics tools attribute that to the user is segment. And it creates this whole shitshow but all the engineers love it because they think that a back end triggered event is somehow inherently more reliable than a front end triggered event… And I just end up embroiled in this complete debacle.

19:18 MH: So Moe, what kind of events are we talking about like context data or…

19:46 MK: Oh, just… Hypothetically, if you had a subscription business and 30 days after you do a trial, automatically 30 days later, the system will trigger a payment…

19:57 MH: Ohh okay.

19:58 MK: Now the user is a paying customer. And it’s not like the user goes to the website, and actually is like, “I’m going to pay now,” it’s like something generated in the system…

20:08 MH: Yeah or I receive a product return and the return comes back through my e-commerce system, but segment will be in communication and pass that back.

20:16 MK: Exactly. And so just for context, the issue with the analytics tools and this is very much a hypothesis that a few people think, “I’m on the right track,” our user counts are completely blown out because what basically happens from a analytics tool perspective is like, front-end, front-end, back-end, front-end, front-end, back-end over the course of what looks to be like a session. And so every time you swap between front-end and back-end, it’s triggering a new user ID because the context is different, so therefore none of your counts are correct. So it’s a big schemozzle.

20:52 MR: So I think there’s a few different aspects there. So one is that we often think of analytics events as being driven by the user. So it often helps to think of it as being driven by an agent, where an agent is a person but an agent could just as easily be a server or a device or a smartphone or a watch or again some kind of agricultural IoT device sitting in a field somewhere. As soon as you think of it that way, you then go, “Well we need some way to stitch this data together.” Usually it’s a user ID or an agent ID or something like that. Typically what a lot of analytics tools do is they do filter out that data or they do detect it as bots.

21:28 MR: The biggest problem tends to become either there’s a user agent which they don’t recognize, as in it’s not easily passable as going, “This user is using this version of a browser,” particularly happens with custom set user agents but often happens with a number of other user agents as well. And then the second one is, it comes from a list of known IP addresses or known IP address blocks that are known to basically be used by some sort of bot traffic. And this is the major way that the IAB, for example, determines bot traffic. And GA uses the IAB list. So it’s either coming from a certain list of user agents or a certain list of IP addresses and it goes, “Well, this is a bot.” It may not necessarily be a bot, but it’s gonna flag it as one. And then regarding the reliability one, I think, are the events more reliable? Maybe. But I think the biggest problem is, is not necessarily reliability but it’s more around trust, which is essentially client-side events can’t exactly be trusted. So if I ask all of you what your height is… What’s your height?

22:35 MK: 170.9 centimeters.

22:38 MH: Wow.

22:38 MR: Okay, so that’s a very exact example. [laughter] So it’s a perfect example which is, you’ve told me your height, and that’s a perfect example of how client-side instrumentation works, is I’m asking you for some information, you’re giving it to me. But fundamentally, if I really wanted to know your height, I wouldn’t trust you to tell me your height. I would get out a measuring tape and I’d actually measure it. So what we often forget with client-side analytics is that it all happens in the user’s browser for the most part. So they could send anything they want. It’s inherently un-trusted. It’s end-user input, it could be malicious input, it could be truthful input, but we really have no way to differentiate between the two. So if I wanted to tell you I was 150 centimeters and then I wanted to tell you 180 centimeters, it all looks the same from client-side analytics because you’re essentially trusting the user that the data and the information they’re sending to you is trustworthy.

23:29 MR: If that’s happening in the server-side environment where the user doesn’t really have any influence over the code that’s being executed or the actual data that’s being captured, it can be considered to be more trustworthy or reliable or accurate or whichever one of those terms you wanna use. Inherently client-side analytics is anyone can send you anything. I can send you an e-commerce transaction for $9,000 or $1 or I can skew whatever metrics I want to, and this happens probably more often than people think it does.

23:58 MK: So would you never… What are the use cases that you do use client-side or you just… Everyone should move to the server and that’s the end of the story, and I’m gonna wrap up the show. [chuckle]

24:07 MR: Well it’s always a bit of both, ’cause you can’t collect everything on the server. If you wanna collect mouse movements, scroll depth, the impressions of ad units or media units or something you’ve really gonna do that client-side. But you do have to treat some of those things with a bit of a grain of salt, particularly with regards to bots. Bot detection now is moving. It used to be very, very server-side based. So like I said, IAB heavily based on IP addresses in the user agent of the thing that was actually sending the analytics events. Now and now it’s getting more essentially JavaScript-based and client-based. So if you look at the latest version of reCAPTCHA, which there’s been a little bit of talk about, inherently Google’s trying to fingerprint you based on the way you move your mouse, how long it takes you to tick boxes, how you fill things in to determine whether you’re a robot or not. So it’s not really a binary, should I be using client-side or server-side? It’s… You should be using both.

25:00 MK: But what about all those helpful flow charts?

25:02 MR: Burn the flow charts. [laughter]

25:04 TW: So to add on to what Mike was saying, as I was trying to do some prep for this, ’cause this is a topic that it’s fascinating, but I had a very, very surface level and it’s crazy how many… It’s clearly a question that a lot of people are asking. And between Tealium has a detailed flow chart and Cognetic has a post and segment and Snowplow has a great post and BLAST and a lot of those either have a flow chart or kind of a side-by-side comparison. It feels like it’s the sort of thing that given no resource constraints, you should do both, but everybody has resource constraints and now which is it that you care about? And I feel like on the… Even with the old server-based web logs and then moving to tag-based client-side, all along it was… All of these are kind of incomplete in approximations. It’s just that the heavy, heavy reliance on page tags, it’s interesting how the data can change.

26:07 TW: We’re trusting the user, I think is a good way to put it Mike, except we’re trusting the user who… Big chunks of the users are putting their trust in whatever browser they’re using. So if Safari makes a change and a bunch of users have basically decided to trust Apple by accepting the default settings, that can change… So it’s not even… It’s a big chunk of people who are changing the way… They’re just kind of handing over their trustworthiness over to Apple or Microsoft or Google and that can kind of swing it. But it goes back to analysts, I think, in certain way our stakeholders have that belief in the completeness and the accuracy of the data and we wanna put our head in the sand when somebody’s like, “This is incomplete and there are parts of it that you will never be more complete or it is cost prohibitive and in some cases it’s really hard to tell how incomplete it is.” If you really care then you need to do some sort of test if you really wanna know what percentage of your visitors you’re missing that they loaded your homepage, then guess what? You need to go and do a server-side deployment and do a side-by-side comparison. But oh guess what? You’re gonna have a lot of challenges matching up similar bot detection logic. So you were gonna be heading down a rat hole that is very deep and dark and likely not ultimately adding value to the business.

27:35 MR: Yeah, absolutely, and I think that comes down to a business thing. And I think that’s what the flow charts tend to over-simplify as well is when we talk about this stuff, what is an acceptable level of uncertainty to have about things? Is the level of error or is the level of uncertainty measurable? And if it is, can we measure it quite easily? And if so, you can report that. You can go, “Well, this is what we think the scenario is. Is this an acceptable level of uncertainty?” And sometimes there are uncertain uncertain. So if you got someone who goes, “Look, I don’t really know how much of my traffic is being blocked by ad block, can we figure it out?” And look, server-side and client-side instrumentation is a classic way to figure something like that out and you can report back a number. But a lot of people just want a way to make a decision and it’s naturally human to go, “I wanna pretend the uncertainty doesn’t exist and all else, even we wanna do X or Y and Z.” So I think it’s a communication thing of going, “Are we happy to make these decisions with a certain level of uncertainty.” And how you go about decision making in those circumstances?

28:37 MK: So if you were starting something up from scratch, you would use a bit of both depending on the use case and you would wanna measure the uncertainty for a client-side. Would that be accurate?

28:49 MR: Yeah, and look at… It’s sometimes uncertainty on both sides. You can have a very good client-side implementation that 5% of times or 15% of times gets blocked by a variety of ad blockers. But you can just as easily go with the server-side implementation and then discover that every 15 minutes at a certain time there’s a CDN, maybe it’s self-hosted, maybe it’s third party that occasionally goes down or actually you’ve implemented something server-side but 5% of the pathways where it goes through server-side, let’s say it’s a subscription activation, there’s a pathway where the actual analytics to call the event doesn’t get called. So there’s nothing to prevent bugs on server-side that inherently makes something more reliable. You’re just as easily gonna run into bugs or communication issues or queueing events or something like that. But it just provides a little bit more certainty rather than eliminating all uncertainty.

29:40 TW: On the uncertainty we’re talking… I think we… To this we’ve been saying uncertainty kind of what’s the data that we’re missing on server-side. This has kinda struck me like bots we keep bringing up when implementing server-side, is the server-side tracking… Have they typically been trying to sort of figure out bot detection as well or is there a risk that… Let’s say you just did… You’re from scratch and you’re implementing and you’re really just trying to see how many people, call it people, came to our homepage of our website and I’ve got a client-side web analyst tool and I’ve got a server-side tool, is the likelihood that the client-side will under-count because of various blocking reasons while the server-side likewise potentially over-count because of the bots?

30:25 MK: Couldn’t the client-side also over-count? Like when it double, like something… Like if it’s on a page load or something like that?

30:32 TW: Well, I mean it could still… It can still have bots that it’s missing. So bots can certainly get through the…

30:37 MH: There’s lots of different ways you could create abnormalities on both sides. So on client-side, somebody refreshes their confirmation page four times, Google will be like, “You had one transaction that had four units in it.” And really you just had one. So a lot of the work in analytics is just going and doing clever things to try to figure that out. The same with bots, Tim, basically we’re just all trying to figure out how to get that out of our data. So when we see something new and there’s always new, it’s just sort of…

31:07 TW: But it’s a Type 1 and type 2 balance that you’re… And back to Mike’s original point, how much uncertainty do you need to reduce? I had the voice of Mark Ashoff in my head that it’s all about making conditions under… Making decisions under conditions of uncertainty. Like if your heat map is missing 50% of the traffic and… It’s a heat map, it’s inherently imprecise.

31:26 MH: You don’t need all the traffic to look at a heat map. The only reason you need all the transaction IDs is just because those are actual things that tie into other data storage systems that map to how much money you made as a company, which has to be pretty exact. But even in JavaScript world of collecting that data, we’ve always been wrong by some percentage, sometimes a bit pretty decent one. So 5 to 7% difference in revenue from back-end systems to client-side JavaScript is not that uncommon. It’s just a matter of coming to sort of a little bit of peace with it.

32:01 MK: Speaking about reliability, so use case, you have a bunch of engineers that have a KPI about something like, I don’t know, number of designs created and because it’s their KPI they decide that front-end is completely inappropriate and they wanna use back-end event tracking because then they can be certain of it. How would go about that conversation? Just out of curiosity. [laughter]

32:29 MR: I think this speaks less to the problems of client-side and server-side tracking and more to the point of KPIs between conflicting teams where you optimize to one at the detriment to the other.

32:40 MH: Yeah.

32:41 MK: Yeah, potentially, potentially. But I think it’s something about the crux of the problem to me is that engineers wanna measure stuff where they feel that… Not that they have the most control over it, but that they understand it the best so therefore, the inherent belief is that it’s more reliable, if that makes sense.

33:00 MH: Yeah, so Moe the way that we get paid on the marketing side of the house is by increasing how many people come buy our products or whatever it is we do as a company, we want more of it. The thing that IT or engineers are doing is trying to ensure that the systems that are necessary to create that outcome remain reliable and stable and understood, so that they can replicate or support that success. So everybody’s pulling for the same thing, but in two very different ways. And so, in IT, our job is to go in and make sure that we are up and ready to take orders when the biggest order day of the year happens and every other day of the year we have uptime and the marketing side, it’s sort of like, how can we maximize all of our opportunities all the time and really push the envelope. And so, it’s almost normal that there is a tension. I made the joke earlier that they’re always fighting, and even in healthy organizations there can be a tension ’cause we’re actually doing two different things.

34:00 MH: One of us is actually trying to take risky actions and the other side’s job is to mitigate those risks. And so when you hear somebody on the engineering side of the house, say “We want that more reliable thing,” understand the context of where it’s coming from, is really helpful? ’cause there’s lots of analytics tools that get used in engineering organizations like application Uptime Engineer, New Relic, all those kinds of things that people put into applications and servers to be able to monitor like, “What kind of load are we under right now? Do I need to kick off another cloud instance of this compute thing?” I don’t really know all that stuff. It’s all very complicated and really important ’cause none of the stuff we wanna do as marketers and analytics would ever happen if those guys in the closet over there, didn’t figure out how to keep our website up.

34:47 MR: I didn’t realize that we were all in the closet. [laughter]

34:51 MK: I’ll use that reference next time that I’m trying to convince them to do something.

34:54 MH: Whatever. Server rooms with a loud fans and I’m sure, that’s not the way it worked anymore ’cause chose it last time I was in any kind of environment like that. Now it’s all cloud-based systems, no more racks of computers.

35:08 TW: So just to… Some things that we haven’t quite covered that are still a little just to be, try to be really clear about it, that we now have tag management systems, most we say initially most digital analytics is deployed through a TMS, but when we’re saying server-side versus client side, I can deploy my TMS. My TMS can actually be… Where does the client-side versus server-side, come when you’re talking about the Tag management system versus the digital analytics platform to do server-side? Do they both have to be operating server-side, is there a both where part of that is happening client-side, but then there’s a server-side component? Is that explainable in a clear way? Like what do we mean when we say going server-side, am I go on everything server-side or some stuff?

35:57 MR: Usually some stuff. Going everything server-side is difficult but doable, but it does mean that you do lose things in the process, and if that’s an acceptable loss that you don’t wanna collect that information, or you don’t have permission to collect that information, then you can go server-side. It becomes quite easy to anonymize everything throw IP addresses out at load balance level, do all that sort of stuff, but typically you’re doing a little bit of both. So you can have service-side tag management systems, they’re less common because typically when tag managers evolved, they evolved on client-side first, but increasingly, we’re seeing more and more server-side systems. Now you don’t have to use a tag management system on the server-side. There’s nothing to stop you directly calling out to measurement protocol or Adobe analytics, or Heap or segment or anything like that directly, so you don’t have to use a tag management system. What it really makes it easier to do is to proxy out those request to multiple locations, so if you’re using multiple vendors, it’s much, much easier to have one place to send all those request to once and have that third party send four or five different network requests out than it is for you to essentially do that itself.

37:04 TW: But if you got a pixel… If you’ve got something that normally you would deploy through a TMS and it would be pushing the pixel client-side and then it would be deploying… The system that’s up-taking that data has to support both server-side or client-side, could I be running my TMS server-side, but now all of a sudden I’ve got some media partner and they say, “We don’t operate with server-side calls.” And now I’m kind of screwed, I have to have something client-side as well.

37:33 MR: Yeah.

37:33 TW: Does that make no sense?

37:34 MR: So that absolutely happens. There’s plenty of vendors who will have an API that they will fully expect that the client calls out to that API or a TMS calls out to that API and sends a certain set of parameters and sometimes they may well be parameters that you can only get on the client-side and that’s very common with AdTech vendors, particularly when they’re trying to do fraud detection and cookie syncing and all that sort of stuff. So there are vendors who do have methods to send data server-side and they’re probably have more and more now, but it’s not unusual for a vendor to turn around and go, “Actually we don’t support that method of sending data.”

38:07 MH: Yeah, Tim, there’s actually a couple of folks at your company who have done a lot of work in this phase…

[laughter]

38:16 MH: You should to talk to them. If you want I can introduce you.

38:18 TW: [38:18] __ on a company. I try not to interact with people at the company, that didn’t work out well for anyone.

38:24 MH: Well, with using a “data layer” which I think is probably the latest thing that’s been happening in the digital analytics space, that allows you to do a couple of things and think about things a different way. And so, the growth of Mike, you mentioned segment and I think that’s a good example of a company that’s grown really rapidly with the rise of… Leveraging like, “Hey, there’s an API, so we don’t need to deploy code on the page. You just collect it, when they come, send it to us, we’ll take care of federating it across everybody else, and look at all the wonderful unity that we get by sending the exact same thing to five different people, and that will be great for you.” And they have the ability to go and collect a lot of things that you would normally see in what I call your more robust digital analytics deployment today. So you can make client-side calls back into that tool to join it or combine it with your segment stuff, at least that’s the sales pitch.

39:19 TW: Which was there was one of that… So put aside the user stuff, if you’re pushing on to your data layer metadata about the page, I think I read somewhere was saying, that data layer is being populated server side, the server is saying, what other… What is the… It could be the content type or something, it’s like it was saying, wasn’t that kind of insane that you got your servers figuring out what is going on the data layer, is pushing it out to the client, and then the client’s sitting there and reading it and passing it back. Why wouldn’t you just have that collected from the server? Obviously, that works for some data that goes on the data layer, not all of the data, but a little light bulb went on for that.

40:00 MH: Well, that’s where event-driven data layers then become a really good model for both client-side and server-side tracking. So refer up on the lingo, EDDL. Shout out to Jim Corden. Alright. So we are getting short on time, so we need to start probably wrapping up. One thing we love to do on the show is go around the horn and talk about anything we’ve seen recently that we think would be of interest to call the last call. Mike, you’re our guest. Do you have a last call you’d like to share?

40:27 MR: Yes, thank you. So one of the things I wanna share is a really interesting bit of research that someone’s done last month on audio fingerprinting. So traditionally there’s been a few different ways to fingerprint devices, often in browsers it’s looking at resolutions, user agents, IP addresses, what fonts you have installed or extension. What this audio fingerprinting API does is kind of a workaround. I don’t recommend anyone using it, but it is a very interesting rate. Is it looks at essentially generating an audio file and rendering that audio file on a device, and using that to actually fingerprint the device. What that does is quite sneaky, because it means that you can fingerprint not just across multiple websites and multiple different sites, but also across different browsers on the same device. It’s a really interesting rate, it’s quite technical but it comes with code and examples of how to do it. So yeah, that would be my one.

41:21 MH: Very nice.

41:22 TW: Interesting.

41:23 MH: Moe, what about you?

41:24 MK: Well, I have a very shameless plug, but I am plugging it because it’s fantastic, and I know it’s fantastic ’cause I QA’D a lot of the content. But my sister, Michelle Kiss, has done a course on Google Data Studio, which is available at the CXL Institute, and it has some amazing recommendations by Simo Ahava and Will Prior. And, yeah, she knows her shit. Whenever I have a data studio question, she’s pretty much the first person I ask. So if anyone wants to upskill in that I really recommend her course.

41:58 MH: Very nice.

42:00 TW: Nice, you wanna go next Michael?

42:01 MH: I would be glad to go next. As I was looking through this show topic and thinking about some things around the end of the year, I realized there are a few people on Twitter that I refer back to again and again, as in terms of better understanding what’s happening in this world of browsers and cookies and privacy and the emerging regulations. And so I just wanted to give sort of a little last call to highlight a couple of those. Obviously, most of our listeners are familiar with Simo Ahava, who’s been on the show. And Tim, I think your last call last episode was about a new site that he put up about browser tracking. I forgot the name of it.

42:38 TW: It’s cookiestatus.com.

42:41 MH: Okay, perfect. And so anyone could contribute to that. So there’s three people Zach Edwards, who’s Twitter handle is @thezedwards. He is prolific in terms of those kinds of stuff. There’s also Arvind Narayanan, I don’t know how to pronounce his name correctly, so I apologize from getting that wrong, @random_walker. He’s a professor at Princeton who does quite a bit of research into privacy issues and things like that, so really great guy to follow. And then the last is Serge Egelman and he’s @v0max, and he does quite a bit of stuff on privacy and cookies and stuff like that. So those are the ways that I stay informed and now they’re the way that you can too if you are on Twitter, I guess.

43:25 TW: Very nice.

43:26 MH: Okay Tim, what about your last call?

43:29 TW: Well, so, I’m glad we had your in-between, ’cause I’m gonna actually go with one that I also did some degree of QA on. I, recently Brent Dykes who was on Episode 42, way back when in 2016 of this show, talking about data storytelling, has a new book out called Effective Data Storytelling, How to Drive Change with Data Narrative and Visuals. And I got to, as he was writing it, I was getting to get a preview of each chapter as he was getting them written. It is… I look at it as kind of a cousin to Nancy Duarte’s book. His is a little bit denser, but it goes through a lot of history, historical examples, and kinda ties them into specific concepts that you can kinda use as you apply. There’s some overlap, a venn diagram of the two books would actually have a healthy overlap, but it’s a darn good book. I recommend it.

44:22 MH: Very nice. Okay, I am sure you’ve been listening and you’ve been thinking, “Oh, client-side server-side, this is the argument I’ve been waiting 15 years to re-engage in. Go ahead, we’d love to hear from you. The best place to reach us is on the Measure Slack or on Twitter or our LinkedIn group, and we’d love to hear comments or questions about the show. Also, we are delighted that we will be in two weeks in Budapest at Super Week, and so if you’re coming to that tickets may have already sold out, but you will see us there. You could at least double check, or if you know somebody maybe give him a ring and be like, “Hey, hook me up with this mysterious shadowy figure who is Zoley, and see if he can get me out for Super Week tickets.” Obviously we’d love to give a shout out to our producer, Josh, and all of his great work on the show. Mike Robins, thank you so much for being our guest, been illuminating.

45:20 MR: Thank you very much for having me.

45:22 MH: And tricky issue, I think you know we will see more server side stuff happening, it seems to be the trend. I wouldn’t be surprised if we saw more product offerings in the future from other people like Google, maybe even Adobe, in this space. So you know, this will probably be something that continues to have impact to our work in digital analytics. And I think I could speak for my two co-hosts, Moe and Tim, whether you collect your data server side or client side, just remember, keep analyzing.

45:56 Announcer: Thanks for listening and don’t forget to join the conversation on Facebook, Twitter, or Measure Slack Group. We welcome your comments and questions, visit us on the web, at analyticshour.io, facebook.com/analyticshour or at analytics hour on Twitter.

46:15 Charles Barkley: So smart guys want to fit in, so they’ve made up a term called analytic. Analytics don’t work.

46:23 Tom Hammerschmidt: Analytics, oh my god, what the fuck does that even mean?

46:30 MH: I think we’re ready to get started, ah I’ll just kick us off and then uh we’ll talk for a while about something that we’ve been talking about for twenty years. But this time, totally different. [laughter]

46:47 MK: And I just end up in embroiled in this complete debacle of… We wish you a merry Christmas we wish you a merry Christmas we wish you a merry Christmas and a happy new year. [47:02] __.

[music]
[laughter]

47:02 MH: We interrupt this podcast to introduce… [laughter]

47:06 Speaker 6: Umm you realize this shown comes out two weeks after Christmas. Two weeks after Christmas, so…

47:16 S?: [47:16] __.

[laughter]

47:18 S6: Jamie was saying last night, like hey if you interrupt the show you’ll be in the outtakes. So mom was like, wow you blew an opportunity.

47:30 MH: Uh challenge accepted.

47:32 S6: Fuck.

47:32 MH: Uh I love your mom.

47:34 S6: Umm, no but this was, this was me last night being like guys I’m recording from 9 til 10.30, 9 til 10.30 there are going to be two sets of doors shut, please do not come in. Every single person in the house had to come in, every person.

47:48 TW: Jamie was wearing a towel, am I right? I’m thinking that Jamie was wearing a towel.

47:52 S6: He, he went to aqua with my mother.

47:54 MH: Oh, what’s aqua?

47:56 S6: Umm, it’s like water aerobics. Umm, and apparently some of the old ladies when they see Jamie around, cause of course mom’s in a group with like a bunch of women who are 70 and 90 and they’re like, he’s a bit of eye candy isn’t he? And I think Jamie actually secretly loves it.

48:15 TW: Really wanna stop recording, but Moe just keeps serving up the gold…

48:20 MH: And I just keep going.

[laughter]

48:26 TW: Rock flag and be nice to IT.

48:29 MH: That’s a good one. That’s right be nice to IT.

[laughter]

Leave a Reply



This site uses Akismet to reduce spam. Learn how your comment data is processed.

Have an Idea for an Upcoming Episode?

Recent Episodes

#243: Being Data-Driven: a Statistical Process Control Perspective with Cedric Chin

#243: Being Data-Driven: a Statistical Process Control Perspective with Cedric Chin

https://media.blubrry.com/the_digital_analytics_power/traffic.libsyn.com/analyticshour/APH_-_Episode_243_-_Being_Data-Driven__a_Statistical_Process_Control_Perspective_with_Cedric_Chin.mp3Podcast: Download | EmbedSubscribe: RSSTweetShareShareEmail0 Shares