#019: R U Curious About R?

Published: Sep 15, 2015

Subscribe: RSS

Subscribe: Apple Podcasts | Google Podcasts | RSS

0 Shares

In honor of Talk Like a Pirate Day (and by popular demand), we donned our eyepatches, poured ourselves a few tankards of grog, and commandeered the wisdom of Eric Goldsmith from TED (maybe you’ve seen one or two of their videos?) to explore the whats, whys, and hows of R. If we’d recorded this episode with Excel, it would have taken an hour, but, with R, we pulled it off in 42 minutes.

Episode Transcript

The following is a straight-up machine translation. It has not been human-reviewed or human-corrected. However, we did replace the original transcription, produced in 2017, with an updated one produced using OpenAI’s WhisperX in 2025, which, trust us, is much, much better than the original. Still, we apologize on behalf of the machines for any text that winds up being incorrect, nonsensical, or offensive. We have asked the machine to do better, but it simply responds with, “I’m sorry, Dave. I’m afraid I can’t do that.”

00:00:03.71 [Announcer]: Welcome to the Digital Analytics Power Hour. Three analytics pros and the occasional guest discussing digital analytics issues of the day. Find them on Facebook at facebook.com forward slash analytics hour. And now, the Digital Analytics Power Hour.

00:00:24.85 [Michael Helbling]: Hello everyone, welcome to the Digital Analytics Power Hour. This is episode 19. Guess what, everyone? I’ve got wonderful news. Are you curious about R? We sure are. And, you know, this Saturday is Talk Like a Pirate Day, so we figured, hey, let’s do a show about R. The opinions expressed by Michael Helbling are not necessarily those of Jim Cain and Tim Wills. That’s a great disclaimer. Now, since Jim Cain and Tim Wilson, who are my co-hosts and I don’t know anything about R, we decided we needed a Sherpa on this journey. So we’ve invited Eric Goldsmith. Eric, welcome to the show.

00:01:11.02 [Eric Goldsmith]: Thank you. Thank you.

00:01:12.80 [Michael Helbling]: Let me tell everybody about you because you’ve got a pretty awesome resume.

00:01:16.10 [Tim Wilson]: And he’s going to be a Sherpa on a pirate ship. So already there’s R. Just go with it, maybe.

00:01:24.28 [Michael Helbling]: shiver me relational databases. Let’s get into this. So yeah, Eric Goldsmith, he’s currently the head of analytics at TED. Maybe you’ve seen one of their videos. And prior to that, he did some time at AOL CompuServe. And he’s a guy for all seasons. He’s done a few things in this industry, and he’s certainly somebody we look up to when we think about R and all things data science related. So once again, welcome, Eric. We’re delighted to have you on the show.

00:01:54.27 [Eric Goldsmith]: Thank you. Pleasure to be here.

00:01:56.67 [Michael Helbling]: Well, let’s jump into it. Let’s talk about R and other things related to R, like the number two and the letter S. So, yeah, R is one of those tools, right? It’s a cool tool for analytics. It’s very, from my perspective, in the short little pieces I’ve tried to use it for. I mean, it’s just got a ton of power. very flexible, and it seems like the ultimate analyst weapon. Should every analyst do it? If you do use it, does that automatically make you a data scientist? Answer some of our questions, Eric. What do you have to say about R?

00:02:37.26 [Eric Goldsmith]: Well, the way I like to think about it, R is a combination of a statistical programming language. and a language for manipulating data. It’s built to manipulate data. I come at things from a programming background. My formal training many, many years ago is software engineering, and I’ve always had that ment toward programming. So it’s a natural fit to have a language that allows you to really program like R. But where I think it really excels is the data types that it uses, the manipulation techniques are all built to make working with data as easy as possible. You can work with data in any language. When I first started programming years ago, it was in C and then C++ and then Java and other things. And you can do these, you know, today, Python’s real popular for data analysis. These are all fairly general purpose languages. You can do data manipulation with them, but a lot of times you either have to rely on external libraries or you have to develop tools or methods yourself to really make it easy. All that is built into R. It was built from the ground up to make data manipulation extraction as simple as possible. So there’s so many shortcuts, so many things that it would take me 5, 10, 15 steps in a general purpose language like a Python. I can do in one line of code in R. So by allowing me to get rid of those details, I can jump ahead and focus on the higher level work. I don’t have to worry about the underlying details of data manipulation.

00:04:08.46 [Tim Wilson]: So you’ve been kind of an R user for about five years, so that’s back to your AOL CompuServe days. I feel like I’ve been hearing about R for maybe a couple of years, although it’s been around for a long time. It was free to go and take the Coursera course on it, free to download and use. So anything that’s free and open source is appealing. for you was it was there something you were doing you were hitting a wall and said I have to find a better way to do this or was it it kind of came along you said huh that’s interesting and you poked around and kind of gradually evolved to using it more.

00:04:41.15 [Eric Goldsmith]: It was really a combination, but I think what drove it the most for me was the visualization methods available in R. I’m a very visual thinker and whenever I’m manipulating data and trying to explain data to people, I try to do it visually. And for years, I had done that kind of work in Excel and was just hitting too many limits. I just couldn’t do what I wanted to do. And I’d always heard about the capabilities of R specifically in that case towards visualization. And that’s what led me to spend the time to learn the language. And there is a learning curve. I won’t lie. It’s a different way of thinking about moving data around. And the building blocks, I suppose you could call them are are different because there’s so much that’s inbuilt in the language. They’re larger building blocks, so you can get more done with fewer commands. So that results in a bit of a learning curve and the way you think about data. But for me, the visualization piece was hugely important. I would run into so many limitations and not bad math in Excel. It’s certainly a tool that I use every day still. But for complex visualizations, lots of dimensions, any kind of sophisticated manipulation prior to the visualization, it was just too much work in Excel or couldn’t be done in Excel. And I was able to get past all that with R.

00:05:57.29 [Tim Wilson]: There are a few analysts that aren’t working in Excel, and if they’re working hard in Excel, they’re hitting limits. It’s interesting you say visualization, and we’d actually talked before that we don’t have any Tableau experts, I don’t think, amongst the four of us tonight either, so we can’t go down that path either, but it almost seems like there’s a parallel. There are two very different types of platforms. It feels like R is probably a steeper learning curve, but probably also has more power. And plus, depending on the role you’re in, you’re not shelling out a couple grand for a tableau license. But that still feels like kind of a false, false comparison. I’m not trying to say tableau and R are the same thing.

00:06:38.34 [Michael Helbling]: I don’t think I put those two in the same bucket.

00:06:42.01 [Tim Wilson]: I don’t think that they’re both. They’re both. I mean, if from what Eric just said was I was hitting visualization limits in Excel and I went to R and there are certainly people who are saying I’m hitting visualization limits in Excel and I’m going to Tableau.

00:06:55.96 [Michael Helbling]: Yeah, I mean, that’s certainly true. I think if I’m following what you’re putting down there, Eric, you know, one of the things that is often a challenge with Excel is you run into things where you’re trying to work through a data set to understand what’s meaningful out of it. and Excel just won’t give you the tools to kind of pry it apart the way you need to pry it apart, either to look at it the right way in a visual sense or to run statistical analysis against it. And so kind of extending into R gives you a greater, kind of just gives you a better tool set for understanding that particular data. Whereas I would, Tim, I would say Tableau is like, once I know what I want to show, I’ll put it in Tableau.

00:07:39.21 [Tim Wilson]: We need to have Leah back on because she actually has said that she’s always like Tableau because of the exploration capabilities within it. But I do feel like it’s exploration within a finite.

00:07:51.29 [Michael Helbling]: Well, because there’s so much that you need to understand about the data before you can even put it into Tableau. You have to position it correctly and get it all kind of normalized so that you can actually visualize it in a meaningful way.

00:08:03.24 [Jim Cain]: I don’t think anybody goes and jacks around with R for four hours. Like, you know, if you put enough data that’s interesting into Tableau, anyone will go in and say, look at the pretty pictures and jack around for four hours and not do anything of value. It seems to me that, and again, stop me if I’m wrong. And that’s, again, not a bad thing. Exploration can be powerful. But R seems to me to be a little bit more surgical in its approach. Clearly define problem, clearly define data sets, and then how do I model inappropriate answers? Is that nuts?

00:08:32.49 [Eric Goldsmith]: Let me preface this by saying I don’t have any firsthand experience with Tableau. But I have heard that a lot of what Michael said is true, that you have to have the data essentially pre-arranged, pre-normalized, pre-aggregated in whatever format that Tableau is happy with in order to be able to use it to its fullest extent. So where do you do that work? Something that I’m a little leery of with GUI-based data exploration tools is sometimes the details behind the data get hidden. You see an aggregate result and you don’t know how many data points are behind it. It may be two, it may be a hundred, so it may be significant or it may be not. And you don’t have that exposed to you as you’re moving through a GUI. It’s not to say you can’t make those same mistakes with a tool like R. It’s that there’s a little more knowledge required to work with the data. So by gaining that knowledge, by developing the expertise to be able to use R, you’ve had to go through that learning curve and sometimes get burned for that information to stick in your head. One of the comments that was made earlier was, R is really helpful with data exploration piece. A lot of times the quality of the data that you get, depending on where it comes from, if it’s internal sources or you’re scraping it from somewhere else or you’re getting from some third-party API, sometimes the quality of the data is really poor and there’s a lot of cleanup that needs to happen. Missing values need to be removed, consistent naming schemes need to be applied to columns and so forth and so on. R makes it really easy to do that kind of thing. You can work with it in an interactive mode to explore and figure out what needs to be done, and then as you’re doing that, you’re building up all the steps that need to be put into a script that can be used for automation later. So you go through that exploration process once, you learn what you need to do, you build the script, and then you can throw that data at that script ad infinitum, and it’s automatic from that point forward.

00:10:20.87 [Tim Wilson]: So I think that’s really important and I was about to kind of go there as well. Something that seems like it’s really unique to R and I’m coming at it as the guy who took the Coursera course so I sort of have what the guy at Johns Hopkins taught me and the exercises that were taught there. That was close to a year ago, so I’m kind of working off of what’s stuck since then and what you just described. Seems like one of the things that is super, super unique. Like when I’m in Excel and I need to clean up the data, I wind up in VBA and kind of trial and error of trying to hack out VBA that I can run to do sorting, renaming, replacing missing values, so on and so forth. But I’m doing the kind of classic writing code, running it against the data set. Usually I’m doing a save before every time I run it because I want to reopen the file and try to run it again. Whereas something that seems crazy powerful about R and as you were describing it, it’s not just that you iterate through it step by step and then you go and take that knowledge and go and create a script. script, isn’t it? You’re literally can copy and paste the iteration that you did in the console into a script. So it is about as seamless as it can get when it comes to iterating through doing effectively ETL type work. that I want to save so that I can then have my clean enough data set to actually then visualize. I can bring in a new data set or an updated data set and and run that same script on it and it’s going to perform that same same functions, whereas macros and Excel are kind of a clunky way to come at that.

00:12:09.09 [Eric Goldsmith]: Yeah, that’s the way I generally work, especially when I’m working with new data or a source that I’m not too familiar with. And I should say that I use a development environment on top of R called RStudio, made by the company also called RStudio, also free and also open source. And it provides multiple panes. One window is your script. One window is the interactive environment. Another window is the data. You can examine all the data that you’ve collected and all the data variables and so forth. So as I’m working with a new data set or accessing a new API, I’ll work in the interactive section to figure out how I want to manipulate the data. And then when I get it right, I copy it into the script section. of the IDE. And then when I’m done, I have a script that I can use on any future dataset. And being able to script things and program like this, for me it’s been very helpful with the reproducibility of the work. A lot of times I’ll do some analysis one month and then I won’t do anything like that again for six more months. But I’ve written the script. I’ve gone through all that process of learning how to clean up the data, documenting it, documenting the output, doing the visualizations. It’s all defined in my script. So when I want to create an updated version or point to a new data source six months from now, it’s simple. In my experience, it’s very difficult to do that kind of thing effectively in an Excel-type environment.

00:13:37.21 [Tim Wilson]: Can you comment? When you say documentation, are you commenting within the? Absolutely.

00:13:42.31 [Eric Goldsmith]: I comment very liberally. One of the big creators of external packages.

00:13:46.52 [Tim Wilson]: We don’t need to get political here.

00:13:48.66 [Eric Goldsmith]: A guy named Hadley Wickham has created many of the extensions to R. They’re called packages. And he has a quote that every project has two collaborators, you and future you. So make sure that you pay attention to documentation when you’re writing it the first time because six months from now you’re going to look at that and say, what the hell was I thinking?

00:14:10.22 [Jim Cain]: of the Eric, in some of our, we did a session once on big data and we kind of rumbled around and all agreed that we don’t understand it well enough and that in a lot of cases it’s a great excuse for IBM to sell a million dollars of the crap. That’s the short version.

00:14:44.04 [Eric Goldsmith]: Yeah, I’m not going to disagree.

00:14:45.60 [Jim Cain]: So the thing is, is that we’ve never had anybody on the show before who approaches, at least to me in the way that you’re doing it, approaches qualitative analysis as a software engineering challenge. And it’s like you’re talking about doing a piece of qualitative analysis the way the software engineers who work for me talk about solving an engineering challenge. And you’re kind of winning me over right now. I got to be honest with you.

00:15:08.91 [Eric Goldsmith]: You know, it’s funny. I get that feedback in the industry, in the other peers of mine who are data scientists. A lot of them come from different disciplines. A lot of them come from a traditional stats background, and I don’t. I come from a computer science, computer engineering background. I picked up the stats along the way. And a lot of times they ask me, what’s different about your approach? Because they’re used to their standard statistical approach and everything’s about modeling. And they didn’t really have that background of what’s it like to look at this from a software engineer. It works for me. That’s the way I think. Programming has been the beginnings of my career and there’s always been a touch point for me in everything that I’ve done that ties back to programming. For me, it’s just natural to approach these kinds of problems in this way.

00:15:51.76 [Jim Cain]: You probably have version control. Oh, absolutely. You know what I mean? I could just see the desktop on your computer right now.

00:15:59.21 [Eric Goldsmith]: Every script I write is checked into GitHub.

00:16:01.01 [Tim Wilson]: So we didn’t cover in the intro that Eric is based in Columbus, Ohio, my fair city. So I’ve gotten it over the last six or seven.

00:16:08.10 [Michael Helbling]: But he’s our friend now, Tim.

00:16:09.70 [Tim Wilson]: Yeah, well, yeah. He’s our friend. He’s our friend. And he was at a web analytics Wednesday this year. This is a guy working on Ted, video data coming into his web analytics. And he was kind of reminiscing about the days that he had really big data sets to work with. And even with the large clients that I’ve got, I’m not fighting the million roll limit in Excel. That’s not the sort of data challenge I think many of us run into. On that front, if you’re pulling in, if you’re capping out GA Premium limits on an API call, does our handle that well? Are there aspects of it that it doesn’t croak? Do you need a powerful machine?

00:16:50.77 [Eric Goldsmith]: Well, there’s two aspects of that. In that particular case, the limit that I was hitting was the million sessions that When you access Google Analytics via the API and you try to query a data range or data date range that will encompass more than a million sessions, it caps it at a million and samples everything beyond that. But it’s doing all that server side. The data that you’re getting back on your machine in R is much, much smaller than that, usually. I wasn’t requesting details in every session. I was looking for some aggregated data that just happened to span a million sessions. One of the limitations of R is that it only works with data that fits in memory. But today’s systems, I think I have 16 gig on my machine, my MacBook Pro. You can run R on a server. There’s a server version of it. and you can go multiple tens of gigabytes on servers. I’m not even sure now if you can get to beyond that.

00:17:50.12 [Tim Wilson]: So where are you? Are you running it on server or are you running it on your today?

00:17:55.27 [Eric Goldsmith]: I’m just running it locally on my personal MacBook, my work MacBook, but at some point in the future, we’re going to be, you know, as we build out the data capabilities instead, we’re going to be building a data warehousing system and it will utilize our server and some other server-based tools to build out the data warehouse.

00:18:14.35 [Tim Wilson]: Do you, with the MacBook Pro 16 gigabytes RAM, are you, do you find times when you’re like, ah crap, I gotta close Chrome because I’m trying to… I’ve never run out of memory.

00:18:25.32 [Eric Goldsmith]: There’s been a few times where I’ve pulled, a lot of times I’m pulling data from multiple databases, multiple MySQL databases. And I try, when I can, I try to do any kind of aggregation that’s possible on the SQL database, so I’ll structure the query that I’m making from R, the SQL query that R is making. I’ll do as much of the aggregation via SQL as I can, and then pull the data back into R and manipulate. But if I can’t, if it’s something where I’ve got to join data sets from two different databases, for example, so I’ve got to have the full data set from both in R. Occasionally, I’ve never run out of memory, but I have run into times where manipulation may take 15, 20, 30 seconds. Instead of instantaneous, it’s noticeable.

00:19:11.00 [Tim Wilson]: When it comes to pulling in data from multiple data sources, is that kind of another, hey I’m regularly having to pull in data from Adobe Analytics or GA and I’ve got a common key and I want to kind of merge that with my MySQL or some other database, are there packages for pulling in from? from a SQL server or MySQL?

00:19:34.16 [Eric Goldsmith]: Well, there’s packages for accessing those data sources. So I use one package for accessing SQL databases. It’s called R MySQL, clever name. And there’s another package for accessing the Google Analytics API called RGA, another clever name. So again, within the same script, I’ll pull the data from GA. I’ll pull the data from MySQL, do whatever manipulation I need to do in R. If I’m going to do some visualizations, I generally use a package called ggplot2, and then I’ll load that in and then do the visualization.

00:20:06.57 [Tim Wilson]: And you can have, just to give a flavor, you can have a script that actually loads in those packages. Am I right?

00:20:11.96 [Eric Goldsmith]: Yeah. At the top of the script, you just list all the packages you need in that script, and they just all load. Just like standard include programming for any other language.

00:20:20.13 [Tim Wilson]: Yeah, but not everybody’s doing standard include programming for other languages. Absolutely.

00:20:25.44 [Michael Helbling]: Well, I think we’re developing a list of things we all need to learn. And actually, that brings me to kind of another topic for discussion, which is, you know, I’m sure Eric, as you’re growing your team and you’re kind of doing things, you’re hiring some millennials and young folks who are getting into this space who may not have exposure to our What kinds of skill sets do you look for? What increases someone’s aptitude to be successful using R?

00:20:56.59 [Eric Goldsmith]: Well, I think having any kind of a programming background or at least that interest in programming, let me take a step back and quantify some of this a little bit. When you’re doing an SQL query that’s considered or known as declarative programming, where you just declare, this is what I want to happen. I want this data, these fields, and I want it sorted this way. I want to join this way. The details of how that happens are up to SQL, up to the SQL engine. You don’t specify how that works. Our type programming and most programming languages are procedural, where you describe step by step, do this, load this data, manipulate it in this way, now load this data, now join it here, and so forth. So that way of thinking of step by step, here’s how I want to approach this problem. I want to take it and break it down into these steps, and here’s where I want to pull data from some other source, and here’s where I want to combine it, and here’s how I’m going to visualize it. That way of thinking about it lends itself well, I think, to complex data analysis. Some would argue that the SQL approach, that declarative approach, where you just describe what you want and let the optimizer deal with it, some people would argue that that’s simpler and allows you, frees you up for the higher-level thinking and so forth. And then to some extent that’s true, but you also are giving away. You, you’re not able to just to define how you want this stuff to work. You’re giving away control to somebody else’s optimizer. So it may do the right thing and it may not. And you don’t, you don’t really know all that’s hidden from you. So in my mind, I want somebody who wants to work procedurally, who wants to get into the details, who wants to describe step by step how things work. That’s the level of knowledge I think you need to be really effective as a real deep analyst.

00:22:50.53 [Jim Cain]: So I actually had a question because it’s one of the ways that I view tools like R or just data science in general. It seems to me that the best fit for that level of scientific level raw analysis of data is right out at or just past the bleeding edge of what UI driven tools can do. So for example, Google Analytics four years ago is kind of junky. And in a lot of cases, you’d have to pull the data out and do sophisticated things to answer questions. The tools gotten way better, which like you said, frees up your time to do more first class, more complex work. So again, we’ve had a hell of a time trying to wrap our heads around a good definition of data science and big data and what do these things mean? So is a big part of the value proposition of RTU to allow you to stay just past the cusp of what UI driven tooling allows you to do so. And as that starts to fill in, then it lets you push farther forward. You know what it is?

00:23:53.89 [Eric Goldsmith]: I think it’s a combination of that because using GA as an example, any kind of reporting within the GUI for GA is limited to secondary dimensions. What if you want more than that? You just can’t. Many times, most of what I do, I need more than that. So I have to develop it in R or something. It doesn’t have to be R, but something where I’m accessing the data via an API and manipulating it myself, visualizing it myself, because I have more complex reporting needs than what the UI can satisfy. That’s one area where I really lean on R. And again, it doesn’t have to be R. It could be other tools as well. That’s just my tool of choice. Another area is combining data from different data sets. I mentioned earlier we use Google Analytics Premium. We also have some internally developed tools that we track different things with. A lot of times I need to combine the data from both of those. We have multiple internal databases that house the details about our talks. And I need to combine that. I need to take the data from GA that tells me the usage I’m seeing externally, and I need to bring in details about the talks, metadata, whatever it might be, transcripts, whatever. I can do that very simply in R. And again, it’s not just R, but I find that a domain specific language, DSL is the term people use, where it’s R is created for manipulating data. It makes it so easy. I could do the same things in Python. I could do the same things in any number of languages. But it would take me a lot more work, a lot more effort, because they’re general purpose languages. They’re not domain specific for data manipulation. So for me, I find that I can get things done a lot faster with a language like R. We haven’t really talked too much about the statistical part.

00:25:42.48 [Tim Wilson]: There’s mean, median, and mode, and that’s about it, right?

00:25:46.47 [Michael Helbling]: No, you can have also a standard deviation as well.

00:25:50.23 [Tim Wilson]: Oh, standard deviation, yeah. Yeah. STDEV, yeah. Those are all on Excel. OK. Now we’re done.

00:25:56.30 [Michael Helbling]: Good to go.

00:25:57.80 [Eric Goldsmith]: So in addition to all this data manipulation that R makes easy, All these statistical tools that don’t exist in things like Excel, like if I want to look at comparing two different medians and I want to know if they’re statistically different from each other. Those tools are built into R. Is that Core to R or is that a package? That’s all Core to R. It was developed by statisticians for doing this work.

00:26:23.93 [Tim Wilson]: But you’re not a statistician by training. How much does the tool cover that? Just like you were talking about needing to be knowledge about the data, how much can the tool make up for statistical knowledge deficiencies?

00:26:41.37 [Eric Goldsmith]: Well, it’s a combination of I have a working knowledge of a lot of areas of statistics, but I’m certainly not PhD level. But the people who wrote these packages for are are that level. So you’re getting the benefit of all their work and their knowledge. So let me give you an example. One of the things I do is forecasting. We have people watch the videos. The rate at which they watch videos has a periodicity to it. Different months of the year have different usage levels. Different times of the month have different usage levels. There’s trending involved. We see upward trends, downward trends, et cetera, et cetera. And it’s common whenever you’re trying to do forecasting of usage data to try to separate out the periodicity, the cycle of the data, separate out the trending of the data, the trending aspect of the data, and then do a forecast on what’s left and then add back in the periodicity and the trending. That’s called decomposition, forecasting, and then recomposition. To do that manually is very laborious, but there are experts in forecasting that have written packages that make it simple. It’s three or four function calls and you’re done. Now you still have to understand enough about how that all works to make sure that you’re sending all the right parameters into the functions and so forth. You’re able to stand on the shoulders of giants so to speak somebody else has done all that low-level work. It’s all been validated It’s it’s all adherent to the standards around forecasting And you can just use that you can just leverage that work and if you Google that the community the stack overflow world Yeah, that there you will find somebody who says oh this is the package and this is a

00:28:29.43 [Tim Wilson]: This is what to do with it.

00:28:33.94 [Eric Goldsmith]: There’s so much out there. You Google on anything and you get so many results. You don’t know how to narrow it down sometimes. But Stack Exchange helps with keeping the quality of the responses high. There’s a whole section of Stack Exchange for statisticians called cross validated. A lot of times the answers will be in there. But there’s so much community support for R and all the R packages. It doesn’t take too long to go through that iterative cycle of Google searches and pretty soon you’ll zero in on what are the right packages to use, what are the most commonly used packages, what are the most commonly used approaches. And then it’s experimentation to find out what works best for you. That’s a good segue into something else that I’ve really started to use recently with R is a web development environment or a web application environment called SHINI, S-H-I-N-Y and also developed by the RStudio people. But it’s a web development environment for our applications. And these people have done some incredible work to take all the detail and all the grunt work out of developing applications. And these are GUI applications. These are very well done applications. They provide all the building blocks. You still have to do the work. I remember developing this kind of thing years ago when you had to do it all by hand. And now there’s so much that you can just leverage and take all the advantages of R and all the data manipulation and statistical tools, layer this web application framework on top of it. And in a day, you can develop an entire application. It’s just amazing.

00:30:12.90 [Tim Wilson]: Sounds a lot like Tableau Server. You heard it here first. We’re going to be pointing back to episode 19, ages hence, saying fortified by a little bourbon, Gilligan called it.

00:30:25.76 [Michael Helbling]: It would be pretty cool if you could use R inside of domo. So one other thing I wanted to talk about and I’m asking for a friend but where else would you go Eric to find tips and tricks and things like that so you mentioned that exchange the cross validated which is a great resource what other.

00:30:46.82 [Tim Wilson]: There’s tips and tricks and there’s actually getting like across the starting line in the first place right.

00:30:52.33 [Michael Helbling]: For the RStudio.com.

00:30:54.41 [Tim Wilson]: Where do you start? Well, I actually did that once. I downloaded RStudio and I’m like, what the hell? Now what do I do? Now what do I do? So I worked the Coursera route, but I would love to hear Eric’s.

00:31:05.68 [Eric Goldsmith]: I’ve been asked this by people who want to learn R, so I’ve looked into this a little bit. And I’m a big fan of the Coursera data scientist toolbox on Coursera. It’s a Johns Hopkins program taught by three professors there. They’ve expanded it a great deal since I last looked at it.

00:31:23.61 [Tim Wilson]: I think there’s… As you say, it was one professor when I took it a year ago, but I know June Dershowitz is taking it right now.

00:31:29.47 [Eric Goldsmith]: There’s three professors now, and I believe it’s spread out over multiple courses. It’s nine courses now. It used to be three or four, so they’ve really expanded it. I’ve heard really good things about that and they cover everything from just learning R to the details about how to do exploratory data analysis, how to do data cleansing, how to make your work reproducible, and then they can start getting into some of the more data sciencey things like doing machine learning and doing regression models and getting into the statistics more, but I’ve heard really good things about that course. The other one that I’ve heard good things about is just it’s a statistics class. So the focus is statistics, but it’s taught with R. So it’s here’s how you do these statistical things with R. And it’s also a Coursera course from Princeton University taught by Andrew Conway. So I’ve heard good things about those. As far as where I get my information, I use Twitter to curate my news. So over the years, I’ve developed a list of people who I follow who feed me the information that I need to stay current in the industry. So if you follow me on Twitter, it’s at Goldsmith Eric, then you can look at the people I follow and then you can develop a list of people who can help feed you information like they feed me.

00:32:50.21 [Jim Cain]: So you’re a creeper on Twitter? Pretty much. I’m the same.

00:32:55.20 [Eric Goldsmith]: Every once in a while, but not nearly as much as a lot of these folks. I would be remiss if I didn’t mention that up till now, we’ve pretty much talked about ours as this tool to facilitate data exploration, data manipulation, pulling data from multiple sources and combining doing the visualizations. But then there’s the whole statistical modeling and forecasting and machine learning and text processing, natural language processing. There’s so much more, so many more packages and other things that are out there that we haven’t even touched on. So there’s so much that is available.

00:33:30.90 [Michael Helbling]: Well, there you have it, R. Those pirates really knew what they were doing. So yeah, I think we’ve, honestly, I think we could keep going for a while. And it’s funny, because I don’t know about Tim and Jim, but I’m super excited to kind of dive into this again and sort of take another crack at this whole R thing, which I’ve attempted a couple times in my career.

00:33:52.19 [Jim Cain]: To me, this is one of the funnest ones we’ve done in a while. And one of the things I really wanted to pick Eric’s brain at is, You know, we’ve talked before, we’ve all ended up here from different backgrounds. And you come from a completely different one as a software developer. And I wanted to start kind of hammering some of our biggest agreements on with you, like the ideal analyst team should have these three people. And I’m kind of picturing you now saying no because you’re missing someone who can do this. You know, so it was a really refreshing perspective. Frankly, you’ve got me turned around a little bit on my thoughts about where data science fits into the ongoing delivery of measurement into a business. Like it’s not a periodic as needed. You really had me with the whole software developers approach. I really, really liked that because for some reason that just made a whole bunch of things make sense in my head about where this fits into service delivery inside of business. So that was really cool, and actually I’d love to pick this one up again.

00:34:54.94 [Tim Wilson]: I’ll throw mine in. I do get kind of excited about this and have sort of dived in semi-successfully, but this whole discussion has me back thinking this is for the analysts who, and whether it’s R, there are other tools out there. There was a pretty amusing if brief exchange between Michael Healy and Tom Miller on measure slack a few weeks ago where I think Michael was kind of making a brief case for Python and Tom Miller’s comment was that he was trying to bring a religious argument into the channel. So I don’t know that it’s necessarily R, but I do think looking at if you’re a whiz at Microsoft Excel and thinking I’m done, I’m set for my career, you’re probably not because a lot of the things that we wound up touching on from the statistical modeling, you know, standing on the shoulders of others with like true statistical stuff and saying with confidence I can apply this use case. Shiny, you know, talking about a web app and I think with Microsoft there’s something through SharePoint where in theory you can web enable an Excel spreadsheet but that just feels like a bit of a bit of a joke that ours actually giving you the potential to say no I am going to make an interactive. you know, thing where I’m going to allow people to refresh, refresh, interact with the data, which does to me start to converge a little bit with Tableau and Tableau Server potentially. We didn’t touch on some of the text mining stuff and Eric shared several examples and didn’t share some of the other examples, but I know there’s their text mining or text analysis packages within R. So that kind of exciting aspect of there’s Excel and there are Excel plugins and I use almost daily analysis engine. I use report builders. So plugins are great, but it just feels like when you’re going in the open source world, there’s just this abundance of plugins and the plugins are iterating. So this has me kind of.

00:37:09.47 [Jim Cain]: If you don’t get enough exposed, it just gets gooey.

00:37:13.42 [Tim Wilson]: So that was going to be my brief wrap up on my my incoherent and rambling takeaways, but I think there there was a lot a lot here.

00:37:23.91 [Michael Helbling]: Well, Eric, I don’t know if you have any takeaways, because you’re kind of the single source of information tonight. So.

00:37:32.15 [Eric Goldsmith]: Well, I would add that anybody who’s interested in learning are it is a bit of a steep learning curve just because of the different way of thinking and looking at the data. So it helps to have a use case. It helps to have something that you want to try to to really some data that you want to try to work with. Don’t just try to learn R, just to learn the syntax. Really have a problem that you want to solve and that will really help you work through the details and really get to the understanding quicker, I think.

00:38:00.35 [Tim Wilson]: So can I, given that, would you say a use case that is I have to deliver something within the next two weeks for work versus a use case of I’m gonna dig into data.gov and download a data set and just answer a question that I’ve wanted to answer? Like how much, how risky is it to say I’m gonna commit myself to delivering something for work? on a given timeline. Do you have a take on that one way or the other?

00:38:31.32 [Eric Goldsmith]: Well, I guess it depends on your your personality and your risk tolerance. Yeah. The first the first time I used R was for a work project that I was committed to deliver on. So I forced myself to to get up to speed and learn it to deliver what I needed to deliver. But that’s just me.

00:38:51.93 [Michael Helbling]: Well, certainly this has been, I think, a really great show. Eric, thank you so much for enlightening us and totally saving Tim Wilson’s terrible show idea. He’s really redeemed it completely. So kudos to you. If you have questions or comments, we’d love to hear from you on our Facebook page or on Twitter or on the Measure Slack. And if you’re not part of Measure Slack, you can certainly find how to get on the Measure Slack from our Facebook page, facebook.com slash analytics hour. Thanks everyone. Thank you again, Eric. I loved having you for Jim and Tim, my co-hosts. Get out there and get you a new shiny R. You’ll love it.

00:39:42.25 [Announcer]: Thanks for listening and don’t forget to join the conversation on Facebook or Twitter. We welcome your comments and questions, facebook.com forward slash analytics hour or at analytics hour on Twitter.

00:40:03.97 [Jim Cain]: I need to grab another delicious beer and then we can get started.

00:40:07.02 [Michael Helbling]: Okay. I like it.

00:40:09.54 [Eric Goldsmith]: We can talk about R for 20 and we can talk about being old guys for another 20.

00:40:14.07 [Michael Helbling]: I like it.

00:40:15.49 [Jim Cain]: You’ll say, so I’m a specialist in R and I’ll go, I don’t get it and I’m waiting S to come out. Now if you’re under 30, you’re an asshole, right? Oh, my joke is a real thing.

00:40:25.00 [Tim Wilson]: Well, it’s gotta be in reverse.

00:40:26.37 [Michael Helbling]: S predated R. See, now the truth comes out.

00:40:29.81 [Tim Wilson]: Animated bubble charts 99 times out of 100 are gonna be completely obfuscating and not helpful.

00:40:37.91 [Michael Helbling]: See now the truth comes out. Tim Wilson is not the boss of you. All right, I’m just making stuff up now.

00:40:45.37 [Jim Cain]: All of Canada shares one dial-up connection.

00:40:47.87 [Michael Helbling]: See, now the truth comes out. Tonight’s episode has been brought to you by the letter R. And the number 22 for all the millennials out there.

00:40:57.84 [Jim Cain]: Kids today, man.

00:40:59.24 [Michael Helbling]: I like it. It’s not about you, man.

00:41:02.15 [Jim Cain]: It’s all about you.

00:41:04.41 [Michael Helbling]: Yeah. See, now the truth comes out.

00:41:07.02 [Jim Cain]: and my three-year-old fell out of bed, so I was like, Daddy loves you. I’m recording. That’s like Ben back downstairs.

00:41:14.29 [Michael Helbling]: See, now the truth comes out.

00:41:16.63 [Jim Cain]: Insight.

00:41:18.83 [Michael Helbling]: Insight. See, now the truth comes out.

00:41:23.60 [Eric Goldsmith]: There, that’s a bit of a sore spot.

00:41:26.82 [Michael Helbling]: See, now the truth comes out. Tim was trying to say something, but we were out of time, so we’ll just never know what Tim had to say. I like it. Facebook app, you know, slash app, yeah. I don’t even, what’s our Facebook page somebody? It’s on the like, lose, whatever, air. There you go. We’ll fix that in post.

00:41:48.31 [Eric Goldsmith]: You guys sound like a bunch of guys I could just sit around and talk with for days.

Podcast: Download | Embed

Subscribe: RSS

One Response

#067: R You Considering Python? - The Digital Analytics Power Hour says:

July 18, 2017 at 12:38 AM

[…] Eric Goldsmith (podcast episode) […]

Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Have an Idea for an Upcoming Episode?

SUBMIT IT HERE

Recent Episodes

#277: ANOVA? I Hardly Know Ya’! with Chelsea Parlett-Pelleriti

August 5, 2025

https://media.blubrry.com/the_digital_analytics_power/traffic.libsyn.com/forcedn/analyticshour/APH_-_Episode_277_-_ANOVA_I_Hardly_Know_Ya_with_Chelsea_Parlett-Pelleriti.mp3Podcast: Download | EmbedSubscribe: RSSTweetShareShareEmail0 Shares