Have you ever noticed that 68.2% of the people who explain machine learning use a “this picture is a cat” example, and another 24.3% use “this picture is a dog?” Is there really a place for machine learning and the world of computer vision (or machine vision, which we have conclusively determined is a synonym) in the real world of digital analytics? The short answer is the go-to answer of every analyst: it depends. On this episode, we sat down with Ali Vanderveld, Director of Data Science at ShopRunner, to chat about some real world applications of computer vision, as well as the many facets and considerations therein!
[music]
00:04 Announcer: Welcome to The Digital Analytics Power Hour. Tim, Michael, Moe and the occasional guest discussing digital analytics issues of the day. Find them on Facebook at facebook.com/analyticshour and their website analyticshour.io. And now, The Digital Analytics Power Hour.
[music]
00:27 Michael Helbling: Hi everyone, welcome to The Digital Analytics Power Hour. This is episode 124. You know, have you filled out one of those captchas that ask you to pick all the images with the stop sign or have the storefront in it or have you laughed along at the Silicon Valley show when Jian Yang created his app that could tell you whether that you were looking at what was a hotdog or not a hotdog? And what does this actually have to do with me, the analyst or data scientist? Well, it’s pretty cool and you have to have a lot of data to do it and it’s machine vision and so it totally qualifies. And so we’re gonna talk about it. Moe, do you think this qualifies as a topic for the show?
01:09 Moe Kiss: Oh, I’m so freaking excited, I’m not gonna lie.
01:11 MH: Excellent. What about you Tim? As if anybody cared about your opinion.
01:15 Tim Wilson: Well, I’d like you to frame the question in the form of a picture and then I will draw a picture in response.
01:21 MH: Oh, Okay. Yeah.
01:23 TW: Yep.
01:24 MH: I just uploaded it to the Measure Slack. So you can respond to it there. Alright. But we needed someone to help explore our vision but… Pun intended for this topic in more detail with us. Ali Vanderveld is Director of Data Science at ShopRunner, prior to ShopRunner, she held data science roles at Civis Analytics and Groupon. She’s an astrophysicist turned data scientist and today she is our guest. Welcome to the show, Ali.
01:52 Ali Vanderveld: Thanks. It’s great to be here.
01:54 MH: It’s so great to have you. Okay, to get us kicked off, we need to level set or start out with something where everybody is here on the same page of what we’re talking about. So maybe, if you wouldn’t mind, kick us off with just a 101 on machine vision or computer vision, whatever it’s called, the future computers knowing what they’re looking at.
02:13 AV: Yeah, definitely. Okay, well, first of all, you mentioned that I have a background in astrophysics and fun fact, there are images there too so it all connects. But yeah, usually when we look at pictures, like say a picture of a galaxy or a picture of a hot dog, we don’t usually think of it as being data but all these pictures actually are data. So if you take an image and you take all the pixel values and every color band, so let’s say, RGB, you can put all of those numbers into a vector and that vector could be the features that go into a predictive model. And that’s basically all we’re doing with computer vision or machine vision, taking all the data that you have in the raw pixel values of an image and feeding them into a machine learning model to try to learn things about that image.
03:05 AV: So the standard classic problem that people talk about in this space is based on a database of black and white handwritten digits called MNIST. And what people do is they take these black and white images and then from the data and that image try to predict what the actual number is. And we really love that at my company a framework for doing this is called deep learning, so deep neural networks have been able to achieve human-level accuracy in this task of predicting these digits from these handwritten images. So fashion is an inherently visual thing and so it’s been a big push in the tech scene lately to figure out what we can do with computer vision in the fashion space. And so my team at ShopRunner is working on a bunch of cool stuff in that area.
03:52 MK: Okay, and so what’s with the computer vision and machine vision? Like, what… Is it just like got two names or?
04:00 TW: I was thinking the same thing. Is one thing like old and uncool…
04:03 MK: Is it like SEQUEL and SQL? I don’t know.
04:06 AV: Yes, it is. Yeah, it’s the same thing.
04:09 MK: Oh, okay. Just someone decided to have a slightly different variation in naming.
04:16 AV: Yeah. Yeah, yeah.
04:17 MK: Now to the fashion. What are some of the other use cases, I guess, that people in the industry at the moment are really looking towards?
04:23 AV: So yeah, there’s a ton of stuff. One example of something that we also participate in in my company is computer vision for wildlife conservation, if you would believe it. So the Lincoln Park Zoo in Chicago actually, they work closely with Uptake and my company, ShopRunner to work in this open source package called Autofocus, where they have camera traps set up throughout the city that take video of everything that’s passing by and we use computer vision to figure out when in that video is there a human or an animal present and then when there is an animal, what kind of animal is it. And so the whole point here is to track animal populations throughout the urban environment to try to help preserve their populations.
05:11 AV: So that’s one specific thing you can use it for. It’s also being used in medicine where people are trying to train the same kinds of deep learning models to look at various kinds of scans you could take of patients and figure out if they have various cancers or things like that. And then people are also using computer vision for things like self-driving cars, actually. So self-driving car, you’d have the scene in front of you, so that’d be an image and then you can have some sort of a light or a something telling you how far away things are from you and you use those different signals together to figure out should I brake, should I turn right, what should I do. So those are some of the most exciting to me, things that people are using computer vision for.
05:58 TW: I wonder if we should back up and have you quickly kinda recap how ShopRunner uses it because that’s like…
06:04 AV: Yeah.
06:05 TW: You guys are a kinda unique business model and have a very kind of direct and cool application.
06:12 AV: Yeah, let me just back up and just say what ShopRunner is, but…
06:18 AV: So, I’m sure you’ve all heard of something called Amazon Prime.
06:21 TW: What?
06:22 AV: Right, stop me. If you haven’t heard about this.
06:24 MK: Well in fairness it’s new to Australia.
06:28 MH: That’s true for European workers. There’s not as much today out there.
06:32 AV: Yeah, so Amazon Prime has… It’s completely changed the game for e-commerce, so their site just makes it so incredibly easy and convenient to shop across millions and millions of products. So they’ve just made it so easy and seamless that it’s just kind of raised the bar for what customers expect in an e-commerce setting, so it’s become kind of difficult for traditional retailers to keep up. So, what ShopRunner is doing is we’re partnering with over 100 different retailers so far and providing them with a suite of tech tools, basically that can clog natively into their website to basically Amazonify it a it. So we give them tools like express checkout out of stock notifications, fraud monitoring, and even a widget to tell you what’s trending right now in different cities. So we provide that to our retailers. And then on the other side, we have millions of customer members, and then they get a bunch of benefits from their membership including two-day free shipping and seamless checkout and exclusive deals, so we’re bringing all these new customers to the retailers as well. They tend to be very high engagement customers.
07:46 MK: Do you funnel them to the retailers, refer them? Or people could buy products on your side.
07:51 AV: So people can buy products through our site, but the main way that our customers buy products is through the retailer’s native site. So. We’ll send them marketing emails that they’re aware of deals that are coming up and things like that, but it’s mostly based on helping the retailer create an easier shopping experience on their own native site. And then we’re also working on a mobile shopping app that’s called District. So that brings everything together in one place. So you could shop across this 100-plus retailers with one universal cart and all this fancy shopping tech that we provide.
08:29 TW: And then the image stuff comes into play.
08:32 S1: Yeah, okay. So a lot of it has to deal with helping our customers find what they wanna buy. So at a very, very basic level, if you wanna take all the products from 100 retailers and put them into one catalog and one mobile shopping app, you have to first of all merge all the taxonomies together. So, a taxonomy is basically how you categorize the products. So if you go to two of our bigger retailers are Bloomingdales and Neiman Marcus. And if you go to their websites if you have the same shirt, they’ll both call it a different thing. So if you’re like browsing through the website, the flow will be different as you drill down to what you want. And so if we wanna merge that all together into one experience, we have to have one master taxonomy that we use for everything. And then also to help people find what they’re looking for, we can create filters. So, what color is this? Does it have a pattern? What length is this skirt? How long are these sleeves? Things like that. So we can use computer vision to if we get in a product catalog from two different retailers that have their own ways of categorizing things we can just take the raw product data and categorize it however we want to on our mobile app so that people can much more easily find what they’re looking for.
09:55 AV: And even beyond that, there’s a lot of cool stuff we can do with computer vision. One of the things I’m most excited about is visual similarity. So if you have one seed product someone’s interested in, but maybe it’s too expensive or is not the right color or something like that, but they want something with a similar shape, you can actually use a computer vision model to show them other things that they might be more interested in based on the images themselves.
10:21 MK: So are you using any of the metadata that the retailers provide or are you essentially… I mentioned to you before we started out with team retail and everything at my company was manually tagged in terms of category and color and size and all that stuff. And at any time you have a human doing that stuff, there’s gonna be mistakes or differences in opinion, even just like colors. So are you using that metadata or is it all just so rubbish you kinda need to through the bathwater out?
10:49 TW: Well, let me ask if there’s metadata for the product that is in their feed, but then presumably I think there’s also a metadata in the images that come with it. So, if you could maybe cover both of those as well?
11:01 AV: Yeah, like a side, you mentioned human labeling of all this stuff. That’s actually what we’re trying to avoid here. We’re trying to build an automated system to get labels for all these items and that’s because we have millions of products and taking millions of products and manually labeling them is super labor-intensive. So, we’ve been training this model to do it so that there’s a confidence thresholds that we give. And we say, if the model is above some level of confidence, then we can take that answer and if it’s not then a human labels it… So by doing that, we’re able to dramatically reduce the number of products that we have to send to you a human. So in terms of data we use, we do… You are right, we do get a lot more data from the retailer than just the image. And actually we could get multiple images. So we can get one or more images and then we also get the name of the product and a text description, and whatever category they call it on their native site. Yeah, there are lots of pieces of data that we get. So the way we are combining that altogether is by using not just computer vision, but also natural language processing techniques. So we take the image and we run it through a computer vision model, but then we take all the text and we run that through an NLP model.
12:28 AV: And so each of those models is gonna have their own predictions for what is in that item. So if you’re looking at a model for color for example, the image is probably gonna give you a pretty good idea of what color it is, but that’s also gonna be represented in the text. They’re probably gonna say, “This is red,” or they might have some funky word for red but that’ll probably be in there too. So by combining the two models together, we can get predictions that are way more confident. And then also it helps us in cases where the image just might not be enough. So, one classic example I bring up is women’s versus men’s athletic shoes can tend to look really, really similar. So.
13:10 AV: The image model might not know but the word “women” or “men” will tend to be on the text. And so the text model will do that discrimination or colored jeans versus pants. Also the image model doesn’t know what material is this or does the dress have pockets? There are some things where having the text data is super, super handy. So we combine the both together.
13:32 TW: And do they also provide… Do they say, “Here’s our taxonomy and some sort of XML or… Their category may be completely unique ’cause they’re geared towards Brooklyn hipsters and think about things in a totally different…
13:48 AV: Exactly.
13:48 TW: Does that even kind of structured data get factored in as well?
13:52 AV: Oh yeah, totally. We have the native category. So that’ll be a series of the taxonomy tree, as you go down the tree, so this is like women’s clothing, pants, crops. It’ll go down the list. And so we break that out and then all of those words are fed into our natural language processing model. So, yeah, they do give us those categories and they’re usually useful, but sometimes it’s something like summer sale or something like that, where it’s actually not useful. But for the most part that data is also super useful. So, yeah, we use it as well.
14:28 MK: So this is a fairly new-ish space in terms of e-commerce. What kind of training datas exists out there for you to use. Or is that, a part of the problem that you have to… What are people doing to solve that problem?
14:44 AV: Yeah, so there are actually now a lot of companies, like tech companies, that are focusing purely on annotating data for tech companies. So yeah, it’s become a really big business of not just image annotation, but all sorts of data annotation. So we do use a couple third-party companies for this kind of stuff. So for crowdsourcing image labels we use one company called TaskUs, where you give them an actual detailed, manual, and you have people working on the images that they are trained in the fashion space. And then we also use Figure Eight, which is less specific but higher volume. So they’re both like, been very good for getting us annotated data. But yeah, you’re right. That’s honestly the hardest part of this problem is getting labeled data for these problems. And then there are some of the things that we’re doing with computer vision where there isn’t even a right answer. So I mentioned before visually similar items so you… Someone has an item they’re interested in, but they wanna see what’s visually similar to it. We can give them a recommendation of a list of 10 items. But is that right? Or is it not? How do you test something like that? So…
16:00 MH: Yeah, what did your model say about blue dress versus gold dress? I’m just curious.
[chuckle]
16:07 MK: Yeah, but to be fair I had this the other day, I was shopping on an online retailer. Who shall remain nameless. And I hit, “Show me visually similar items.” And I went from a floor-length black dress to a bikini and I’m like, “Oh I can see, I can see how it happened ’cause the neckline was similar.” I’m definitely not shopping for a bikini right now, so you know…
16:29 AV: Yeah, the way we are doing that, it’s pretty cool. So we have this model that tells us where something falls in the taxonomy, so what it is and that’s based a lot on the shape of the object. And then we have a different model that tells us the color and pattern of the item but it kind of ignores the shape. So we actually take the results of both of those models and we merge them together and use that in visual assembler API. And so it tends to give us things that are the same shape and the same color and pattern. And so far we’ve been testing it versus itself, different iterations of it. But at the end of the day, with something like that, you just have to put it in front of customers and see if they click on it.
17:11 TW: Well, this is one of the… As we were kind of prepping and I had found a technology review article about the amount of training and tagging and then, now, you said, “Oh, look at this New York Times article.” I was kinda surprised and I don’t know how… The worst thing you kind of explain, I feel like there’s this idea that… “Oh yeah, you need training set. So give somebody a 1000 pictures have them tag it and now you’re off to the races.” But one of the examples was like, “Oh tagging cancer cells.” And I think that’s one that I would have thought. “Oh, well, you know, you tag 1000 cancer cells and then you’re done.” So can you explain some of the mechanics is to, what’s the scale of a training and tagging is needed?
17:55 AV: Well, okay, so the amount of training data you need depends on several different factors. But if you think of how many different pixels are in an image and you think of how many different… If you take the same shirt and you take a bunch of different pictures of it, they can all look pretty different. So if you’re trying to build, say, a category model, I think a general rule of thumb is you need 100 to 1000 different examples for each category. So now, in our taxonomy, just in clothing, we have 400 different categories. So then, if for each of those, we wanna have at least a 100 images, you can see how it can get… You need more and more and more, and then that’s just forgetting the category. If we then want the color and we want the pattern, we want season it’s good for, and the neck line, and things like that. For every aspect that we build a model for we need more training data for every category. So if you think of all the different taxonomy categories plus all the different color categories plus all the different pattern and you multiply them all together to get the total number of possibilities, and then you multiply that by 100. It’s a lot of stuff. It’s a lot of annotated images, you need.
19:14 MK: So yeah, we were talking about this quite a lot with retail and sizing as well. Like sizing is such a horrific problem for shoppers.
19:21 AV: It’s so hard.
19:22 MK: Finding what you need and sizing, those the two problem that are hardest to solve. And there’s always, it’s like, all retailers in the world need to combine their data so, there can actually be enough for everyone to have really good models. But then you’re solving the problem for everyone. So there goes the whole competition thing. I don’t know.
19:37 AV: Yeah, yeah, yeah. That’s really hard for sizing. You don’t wanna just see what people… You could take every person and look at all the clothing they bought across a bunch of different brands and see how the sizes match up, but you wanna know if they returned something, “Why did they return it? Did they return it ’cause it didn’t fit right?” And it’s like getting that data for not just “What they kept, but what didn’t work and why?” is… Why.
20:05 AV: Some companies are much better positioned to get that data in-house than other companies. Like companies like Stitch Fix and Trunk Club that give their customers a batch of new clothes very regularly, they get that feedback back where people are like, “Yeah, this didn’t fit me and it was too big or it was too small.” So companies like that definitely have a competitive advantage in terms of actually having the data they would need to build that kind of model. But you’re right, sizing is super hard and we have not attempted it yet.
20:35 TW: So… I have so many questions.
20:37 AV: Okay. [chuckle]
20:38 TW: I feel like you’re in the… And I’m kind of in the world of… It just seems like magic and it seems like you, “Oh, let’s just add a new category or let’s just add this new thing.” The rational me imagines that there has to be some sort of crazy kind of intake and prioritization process. And obviously, without doing anything proprietary, do you have a mountain of requests for, “Can we do this other thing that you guys have to prioritize? And what are the considerations?” You mentioned, presuming there’s a calculation saying, “What’s this gonna cost us in human tagging?”, how do you go about saying, “What’s a new capability that we wanna add to our system?”
21:27 AV: Yeah, so that’s a really good question. Well, we’re trying to get the basics done first. And so, figuring out the taxonomy was the first thing we wanted to do, and now we’re working on all these… Well, we’re calling them “attribute tags”. So all these different things that… I think the taxonomy is like if you’re browsing on a website, that’s how you find the category you want. But then, attribute tags are good for building custom filters. So in terms of those, we definitely have a prioritized list and some are a lot easier than others. So, we really just went for the low-hanging fruit first and all the low-hanging fruit ended up being super useful. So, that’s why I’ve mentioned color and pattern and skirt or pants length and then sleeve length. Those ended up being the easiest four to build. For the attributes, and… What we’ve tried really, really hard to do is build a system that’s as extensible as possible. So, whenever we get… People ask for a new attribute that they want working on season, is this good for winter, spring, summer or fall? When we get a new attribute request, we’re trying to minimize the amount of work that we have by reusing the existing architecture.
22:43 AV: So, this might be getting a little too into the weeds, but we’re doing something called “multitask learning”. So basically, we have one big deep neural network that we put our data into and it gives us what we call “embedding vectors”. So it’s just taking an image and representing it as a smaller vector representation of what’s in it, and then we take that vector and we can put it in any kind of model we want. So we put it in a color model, the season model. Because it’s this really well-formulated vector representation of an image, it is this multi-purpose thing that we can repurpose for different types of attributes. So because we’ve been trying to be really smart about how we build our architecture for our modeling, it’s made it much easier to incorporate new things. The hardest part is getting the annotated data. So, that’s another consideration is how much is it gonna cost to get the annotated data for this attribute? So getting it for seasons is actually not that hard relatively speaking, but let’s say we had something like… One example of a thing you could do is sleeve type. Apparently, there are 30 different kinds of sleeves you can have. And this is something…
24:00 MK: I believe it, I totally believe it.
24:02 TW: And Moe could probably list 17 of them without even blinking.
24:08 AV: There’s like yeah, so many sleeves. I don’t know why… I’ve learned so much about fashion since starting this job ’cause as you guys can see, I’m just wearing a hoodie right now and that’s all I wear. And it has one sleeve type, it’s the hoodie sleeve type. But apparently, there are a lot of others. And so that’s one… So there’s the ROI of the new attribute, right? So there’s a lot of market research that our product team does to look at other websites and see what they do and what creates a seamless shopping experience. And then that gets weighed against how hard it is to get the training data. And the more categories there are to an attribute, the harder it is to get the data for, and then the fuzzier the categories are, the harder. So something like sleeve type, it’s not a super hard and fast thing, but something like the color of an image, that’s a lot easier to define. But fun fact, color is actually a lot harder than you might think.
25:07 TW: Why is that?
25:08 AV: It’s so subjective. There was one time where two of my coworkers were arguing over what color a dress was, and so they brought me over and they’re like, “What color is this dress?” And I apparently said a third color, so one of them thought it was red, another one thought it was brown, and I thought it was purple.
25:27 MK: But it also depends, even just the photography. We used to have heaps of photos that were stripes, and the stripes just don’t show up on film for some reason. And you don’t have the physical product in front of you, so it’s not like you can just be like, “Oh, let me check that.”
25:41 AV: It’s that whole blue versus black dress thing all over again. It’s super subjective. So yeah, so all the attributes are difficult in their own ways, but some of them are way, way harder to get the training data for than others. So we basically need a fashion expert or a fashion consultant for.
25:58 MK: It’s funny that you say that on sleeve length, then pant length and all that sort of stuff, is like low-hanging fruit. I used to hear our customers complain about that all the time. “I want to know how long it is.” And it sounds like for you, that was one of the more simple things to do. So that’s pretty cool.
26:12 AV: Yeah, it was. One of the things that can be difficult with fashion images is sometimes they get cut off. So, there was this one issue we had where we’re trying to predict skirt length for a dress or a skirt itself. And there will be some pictures of long dresses where it gets cropped, and then the model would say it was a miniskirt. All these cropped images were put in the mini bucket.
26:34 AV: So I was wrong. But, that’s again, one of the reasons why the text is so useful. This is actually the subject of my most recent talk at ODSC East, is how powerful it is to combine image and text data together because then a text will usually say it’s a longer dress, and then you can combine them together.
26:54 MK: But so that’s different to a GAN, right? ’cause they actually are intended to feed off each other, or…
27:00 TW: Okay, you just…
27:01 MK: Can you explain that?
27:02 TW: Dropped the GAN acronym, so I believe we’re gonna have to call a… We need an explanation there.
27:08 MK: Generative Adversarial Network? Yeah, almost got there.
27:14 AV: Mm-hmm. Yeah so those are different. So what I’m talking about is two separate models, that take in different data, and predict a thing and then they kind of vote together. See if your image model figure something out, your text model tries to figure it out from the text and then they kinda convene together to figure out what’s the best thing to do. Like a guy on my team calls this ensemble model, “Megazord” and I don’t really…
[laughter]
27:42 AV: Get the reference but it’s like these two models are more powerful when they combine forces and that’s actually the reasoning behind a lot of what we call ensemble models in machine learning. So, like a random forest, is powerful because you have all these individual decision trees that each have different subsets of the information, but then together, they vote on the answer and we’re kind of doing the same thing, only the subsets of the information is just the image on one side, and the text on the other. So they’re not adversarial at all, they are perfectly cooperating. But if you wanna talk about adversarial networks, we can.
28:21 MK: I’m just curious to hear, ’cause yeah, someone mentioned it to me the other day, I’m like, “Oh wow, that’s really cool.” So yeah, like I think it’d be awesome if you explained just broadly, the concept.
28:30 AV: Sure, yeah, so the concept here is that you have two different networks, so two different machine learning models, you have what’s called the generative model, and then you have a discrimitive one… Discrimititive one?
[laughter]
28:46 MH: Anyway, one of them tries to… You have a set of training data so you have a set of known examples of a thing. So a thing we’ve done on my team is images of dresses. So let’s say we have a whole bunch of images of different dresses. So what the generative model does it looks at all those images and it tries to create a fake dress, what it thinks might fool the other model, and then the other model tries to figure out if that dress is fake or real. And so, in this way, the generative model is trying to fool the other model by creating more and more realistic examples. And so as you train it, there’s a loss function, that determines that. But the idea here is to take the universe of dress images and map them on to a latent vector space. And then once you have that mapping, what you can do is create fake dresses so you can just pick points in that vector space and create fake dress images from nothing. You might have seen this in the news. People have done this with faces. So, I saw an article, was like “This face does not exist,” or like “This person does not exist.”
29:58 MH: Yeah, it’s a great website.
30:00 MK: It’s super creepy.
30:01 MH: It also exists for cats too.
30:02 AV: There’s one for cats?
30:03 MH: Yeah, the cats one is not as far along honestly, and it’s scary honestly.
[laughter]
30:09 AV: When are we gonna put more resources in a cat computer vision?
30:11 MH: Yeah.
30:12 AV: Like this is my dream.
30:14 MK: I mean from fashion though, there is a really good application, where you could potentially show dresses that you haven’t designed or production-ized yet, see whether or not people like them and then you actually create the dress, so to speak.
30:28 AV: Yeah, so there’s definitely a product use case there. So actually, my team has a blog post coming out soon, one of my senior data scientists wrote one called “This dress does not exist,”
30:40 MH: Nice.
30:40 AV: And it’s about this exact thing. What he based it on was StyleGAN, which is a generative adversarial network that was built by Nvidia in 2018, but he basically just re-purposed it for clothing and in particular for dresses, so he made all these fake dresses. But you’re right, that’s a thing that we’ve been thinking about is how do we harness this for e-commerce use case? And, yeah, we could have people basically design a custom dress like their dream dress, and then maybe try to see what’s visually similar to it that already exists in our catalog.
31:16 MH: Oh, wow.
31:17 MK: Ah, nice.
31:17 MH: That would be so cool.
31:19 AV: Yeah, so if you had some sort dials or something that you could tune and say, like, “Oh, I wanna change this a little bit or make it a little longer, or, have a whatever sleeve-type.
[laughter]
31:30 MH: From whatever list of 17 you want. Yeah.
31:34 AV: 30.
31:34 MH: 30!
31:35 AV: 30 sleeve types.
31:35 MH: Oh, wow. Okay.
31:36 AV: Yeah, so many sleeve types.
31:38 MH: 30 sleeve types.
[laughter]
[laughter]
31:39 AV: But then… But yeah, then you can help them find their dream dress. Another thing that we’re thinking of using it for is when we build these training data sets for these models, we have some categories that are pretty rare, so we don’t have a lot of examples of them. So if you look at the full shop and or catalog of millions of products, we have a lot of dresses. So, it’s not hard to get a lot of labeled examples of dresses. But we have some really rare categories, like, the canonical example we always give is beach bags. So I guess that’s a kind of bag and we have three in our catalog.
[chuckle]
32:19 AV: So even if we labeled all three, that’s still not really enough label data to make a model, so we wanna use what machine learning people call data augmentation techniques. Where we try to create more examples somehow when they can be fake examples, but… So this kind of creating fake products, can be really useful for filling out our training data set for rare categories.
32:46 TW: Interesting.
32:47 MK: Oh, that’s so cool.
32:49 AV: Yeah. So look out for that blog post in the next couple of weeks.
32:53 MH: Nice. Alright, we need to take a quick time-out, ’cause we wanna hear from one of our sponsors, here on the Multi-touch moment.
33:02 MH: Hey Josh, you know how it is out there now. You spend two years slaving away over placing some Facebook ads, and suddenly, you’re a director of digital marketing and you gotta be able to communicate results to the business. You know what I mean, right?
33:18 Josh Crowhurst: Definitely.
33:18 MH: Yeah, and there’s millennials out there and Moe is a Xennial. We need to be able to speak the language of the people we’re talking to. And I think this is where this product will really come in handy. You’ve probably seen it, ’cause it’s getting a lot of major attention right now, especially with social influencers. It’s called Emojilytics. You know that one?
33:41 JC: Yeah, it’s the millennial analyst’s best friend.
33:43 MH: It certainly is. And there’s so many to choose from, so you really get a change to communicate what’s happening with all of your social media, buying and selling of advertising, in pictures and words that people of today understand and get.
34:01 JC: Yeah, when your follower count goes down, you get a big fat poomoji.
34:05 MH: Yeah, so it’s perfect. It’s like, the poop emoji shows you, like, “Oh, the whole thing is down.” Or, the big dollar sign eyes emoji is like, “Hey, we are crushing it out there and we’re getting tons of mentions on Twitter.” That’s what people need to understand, and it makes it so simple. Honestly, I don’t know how we got along before this tool came along. It will really help people understand what’s happening out there with their advertising dollars, that they are communicating the results in emoji.
34:36 Josh: Yeah, it’s pretty much the best thing since avocado and sliced toasted bread. And man, once you’ve used Emojilytics, it’s really hard to go back to the native platforms. They all just kind of feels like a barfing face emoji.
34:47 MH: Well, you know what they say: There’s lies, there’s damn lies, and there’s statistics. But you know what? When you want the truth, you go to Emojilytics.
[music]
34:57 TW: So, you mentioned architecture a couple of times?
35:00 AV: Oh, yeah.
35:00 TW: And we’ve talked about having people annotate data. If you go out, Google, Microsoft, IBM, Amazon, all offer… They’re all doing image recognition. So, how much of a balancing act, what do you decide to or have to do in-house, writing code, versus leveraging third-party developed sources?
35:24 S1: Yeah, so there are actually… There are few options here. There are AWS and Google and Microsoft, all have fully out-of-the-box computer vision options that can do a variety of things. Those are good if you have really standard things that they provide. You have to have use cases that match up to what they’ve built. So for example, if you’re doing image classification, it has to match up to the preset categories that they already do. There’s also a lot of face detection out-of-the-box, and then, OCR, character recognition. So, if you take a picture of text, it can transform it into actual text. So, if you’re doing some of these more standard computer vision-type things, out-of-the-box solution really can work pretty well. Yeah, those are the main use cases, I think. But then, if you need something… Let’s say you have different categories than what you find in an out-of-the-box thing. Let’s say you have very specific use case, like you’re trying to predict sleeve types. I guess it’s not something that you find in the Google Cloud. They are pretty defined categories. So, there are also some solutions where you can input your own training data and you can get results without writing code.
36:48 AV: So, there’s Google ML Vision, Google AutoML Vision, Clarify, and then, IBM Watson Visual Recognition, are all good examples of that. But when it came right down to it, we really wanted more control over our model and our accuracy and the data set. And we had all these cool ideas for things we wanted to do that require not just the classification label, but the actual… We wanted the embedding vectors from the guts of the model, and that’s not something that you’ll get with most of these systems. So, we really built everything in-house. We ended up using PyTorch for most of it, and we built our own APIs using Flask. But yeah, we actually went through it and built stuff ourselves. But I should say that it’s a lot easier to build stuff yourself than it used to be because of a technique called transfer learning. And I’m not sure if any of you heard of that before, but…
37:50 TW: Is that… Is that the thing where, in the languages that we’re saying we somehow managed to have something that does Japanese and Korean, and somehow, it blends together and now it can translate… I’m totally butchering it, but that’s my only reference, was there was a language thing where it was learning to somehow… Is that possibly in the same ballpark?
38:11 MH: Yeah, so all these models, deep learning models, they’re all… It’s a network of a bunch of layers, and the lowest layers are learning very basic things about images or language, or whatever the problem space is. So, just talking about images, the basic layers of one of these models is just picking out edges and shading, and there’s a round thing in that corner, stuff like that. And so, what we’ve realized you can do is… And not just as in shopping, it’s something that people have known for a while and we’re just taking advantage of, but there are all these academics that have spent a lot of time painstakingly building really, really fancy deep neural networks to do image recognition. And they’ve done them on open data sets. For example, the most popular one is ImageNet, which has millions of images in thousands of categories, and it’s stuff like, “Cat”, “Dog”, “Boat”, “Tree”.
39:14 TW: ImageNet? Are those all ones that somebody manually tagged along the way?
39:20 AV: Mm-hmm.
39:20 TW: ImageNet confuses me a little bit. Is that millions of images that have been manually annotated.
39:25 AV: Mm-hmm, I think so, yeah. And so, they have 20,000 categories, actually. So, what academics have done is they’ve built all these different architectures, basically competing on this dataset to try to see who can categorize it better and through that process, people have come up with some really, really nice architectures for these models. So, just like how you’re arranging the layers and what the parameters are. People have come up with some really really great models, but these models are built to categorize ImageNet. So cat versus dog versus boat versus tree. But what people have realized you can do is take that model… Seriously, this is all you do, you just chop off the last couple layers, put a couple of new ones on and then specialize it to your use case. And the way this works is because the lower layers were just learning general image stuff like where you have circles and lines and basic properties of images, and that stuff is very generalizable to other use cases.
40:29 AV: So all you really need to do is just take that end part that says, “When you have a circle here and a line here and a shade in here, it’s a dog.” Just take that last piece off, put a new piece on. Put some new training data through it and it’s way easier to train than trying to come up with something all by yourself. And so this is why a lot of people have been able to get really state-of-the-art results with these computer vision tasks without actually having to build the models from scratch. You’re taking things that are already trained on this really comprehensive data set and you’re just kinda tweaking it to your use case. And so, that’s a technique called transfer learning, and we’re using that heavily at ShopRunner, and we use it not just for images, but also for natural language processing. Now, that’s actually a pretty new advance in NLP, is being able to do transfer learning, there too. So the kind of models we use for images, we use a lot of ResNet models, VGG models and then for images we use Altnet and Burt.
41:35 MH: So all you need to do is, all you need to do, all you need to do is get a PhD, learn VGG, ResNet, let’s not overly trivialise this.
41:46 AV: I was actually about to go here. You’re making it sound really easy.
41:53 MH: Don’t worry everyone. We’ll have it all on the show notes, so you’ll be up and running in no time.
42:00 AV: Yeah, I saw… I looked at previous episodes, and I saw you put links in there. But yeah. And there are tutorials, you can find on the internet where you can get up and running with this stuff decently fast. So yeah.
42:11 MK: I think the thing is, which we’re not fully giving enough accolades for is you make it sound easy, but I think that’s ’cause you’re really good at explaining it, not because it’s remotely easy. I can’t even get my head around between an image being flat lay or 3D and the amount of complexity that that would even just alone add to the problem, let alone.
42:33 AV: We don’t do 3D stuff. I mean maybe some day we could do 3D stuff.
42:40 TW: No, no, you do pictures of 3D models.
42:43 MH: For apparel, it would be like odd figure versus lay flat like that kind of a thing.
42:48 AV: Yeah, yeah, yeah, yeah.
42:51 MK: Yeah it’s not like actual 3D mapping.
42:52 AV: Well, you know, anything that’s 2D could be made 1D just by taking all the numbers and putting them in one line. And that’s seriously, what these models do, is you have this two-dimensional grid of numbers, you can seriously just take all those numbers and put them in one long vector and that’s what you pass through the model.
43:12 TW: But, so, cause I feel like something like 80% of the time when machine learning gets talked about it is in the context of machine vision or computer vision. Like the cats, dogs, hot dogs that gets used as the example. And then even, it was interesting, when Moe said other examples, not to try to trace this back to. Yeah, ShopRunner has this really cool reach where the sky’s the limit. There’s a million different ways you can pursue it. I can say at Search Discovery, we’ve had one example where we did machine vision type stuff that I don’t understand, how we did it, but we did it around trying to predict social media performance for images. But it seems like…
44:03 AV: How did you do that?
44:04 MK: As in whether the post would be popular?
44:06 TW: Well, it was taking a huge volume of digital assets and then taking the ones that had actually been used on social media and then trying to figure out what features could be derived from those images that they could be modelled and predictably say, “Oh these images appear, that they will do better. I’m not even sure how we define better. Not a client, I worked on.
44:26 MH: But Tim’s using it personally just to become a big time social media influencer.
44:31 TW: Yup. I’m gonna get that 14th Twitter follower any day now.
44:36 MH: #liveYourBestLife #Blessed #YogaInParadise.
44:42 TW: But I feel like there’s so much talk of machine learning. And then we get the image side of things, and trying to… I certainly know of brands that have a massive library of images, that they don’t… They have seven tags. So that seems like one that is in a digital asset management world, that, “Hey, figure out what else do you want to sort of featurize of these images, do some training and then cut that loose, and now, you can probably make a richer experience, on your site because you have more kind of structured data around your images.
45:23 AV: Yeah, definitely, where we humans don’t just see a vector of numbers. We see these categories that people care about and that’s the whole point is to create these filters that make it manageable, to go through the catalog. People are not gonna flip through 2 million images to try to find the product they want. They want things in a neat categories that they can search through, so that’s what we’re trying to do.
45:48 TW: But it’s interesting, it seems like that’s if you’re an analyst, a traditional humble analyst and you’re in an environment where you say, “We know we have a sea of images, whether they’re our images, whether they’re ours and our partners images. If no one’s thinking about it, then it makes sense to dig in a little bit and try to think about where’s the potential and then go get a PhD or hire a couple of PhDs, learn ResNet, VGG, other acronyms, deep learning, and then it’s very easy.
46:22 MH: PyTorque, Flat, you know, all those things.
46:25 AV: You definitely don’t need a PhD to do this stuff. And I have people in my team that don’t have PhDs.
46:34 TW: It’s always… Just… Okay.
46:42 MK: Yeah.
[chuckle]
46:42 MH: Okay, okay. We’ve got… We’re at a very good point of which we could kinda wrap up. Ali, this has been tremendously informative so thank you so much.
46:51 AV: Yeah, no problem.
46:51 MH: Actually, we do do a thing called the last call where we go around the horn and we share things that we’re interested in, that we found interesting, that we think our listeners might enjoy. Ali, you’re our guest, do you have a last call you wanna share?
47:04 AV: I do, yeah. So on the subject of learning about this stuff, I would highly recommend the Fast AI course that they have. It’s called… Well, they have… I think they have probably have multiple courses, but there’s one in particular that me and my team have loved and it’s called Practical Deep Learning for Coders and version three just came out, I have multiple people in my team who’ve actually learned deep learning from this class from scratch and are now doing this kind of work as their job, so.
47:34 TW: And they’re coders that means they’re well versed in Python? What is the “for coders”?
47:38 AV: Yep, it’s Python-based. So I love that class. It’s… Yeah, taught my team a lot. We’ve grown a lot of talent in-house using that class.
47:47 TW: Nice.
47:47 MH: Tim, what’s your last call?
47:49 TW: So, one, as we were talking, I did realize that the… Like the playing with the easy stuff, that I had to go look at it, but it was almost, it was actually over two years ago, I’ve spent a period playing around with the… Google’s Vision API, with the RoogleVision. So a last call from episode number 71 was me having fiddled around with pictures I had taken and throwing ’em up into the Google Vision API which was… It was actually, it was, to Ali’s point, that being moderately okay with R, I was able to actually do some interesting stuff with images that were personal images identifying various buildings and landmarks and that was kinda interesting.
48:32 AV: Cool.
48:33 TW: And then my second one is gonna be a weird kinda throwback to the last episode ’cause it hadn’t come out when we chatted with him. Pixalate did a… Their Q2 ad fraud survey or study. So when we had Augustine Fou on and he talked about kind of his estimates of the scale of ad fraud that Pixalate came out with the study and kinda one of the stats that bubbled up was that their estimate is that 19% of programmatic traffic in the US is actually fraudulent, which is lower than what Dr. Fou would have estimated, it was actually the place I picked it up from also thought it was a fairly optimistic view but it’s kind of a good read ’cause there’s a lot of slicing by country and by other characteristics. You have to put in your email address to get the PDF but ad fraud, it’s a real thing, people. So there, that’s my twofer.
49:30 MH: All right, perfect. Alright, Moe what about you?
49:33 MK: I’ve been hanging out with Tim too much ’cause I’ve got a twofer as well. So we had someone present, a couple of months ago, her name’s Claire Carroll, she’s a dbt Community Manager at Fishtown Analytics in Pennsylvania, but she’s actually an Aussie and she came out and presented to us. Anyway, she had talked about a post by Michael Kaminsky from the Locally Optimistic and we keep going through where like I occasionally have a meltdown, I’m like, “Maybe analytics people should just call themselves the data scientist then they’d get more money.” Well, he has a new term or phrase that he thinks analytics people should be calling themselves and that’s analytics engineer and it’s basically the fact that like… And to be fair, my team, are like this. My team does probably more an… Data engineering than analytics work and I don’t have a view on it yet, I’m still just like, “Oh, that’s weird. So maybe that’s gonna be the next new sexist job of… “
50:27 MH: Oh good lord.
50:27 MK: “… Of the year.”
50:29 MH: Well, I’m certain that… One job title that’s gonna finally make it complete.
50:34 MK: Yeah, it’s gonna make you happy.
50:34 TW: Where the analytics translator is gonna talk to the analytics engineer who’s gonna talk to the business analyst who’s gonna… It’s gonna be like a whole army of people trying to talk to one data scientist.
50:46 MK: Yeah.
50:47 AV: Well, I’ve actually… It’s been suggested to me that my team be called machine learning engineering and not data science. So maybe.
50:55 MK: We have them at work. There’s people at my company called machine learning engineers, so.
51:00 AV: Yeah.
51:00 MK: Yeah. But the other one, actually, I found via Adam Singer and it’s not actually work-related but I think that everyone in the world should read this article ’cause it really resonated with me and it’s called “Why The Millennials Became The Burnout Generation” and it just… It’s a long read but it just… I don’t know, it’s something like really clicked about like what’s kind of going on with our generation and like, I don’t know, approaches to work and…
51:25 MH: Wait. Whoa whoa whoa whoa whoa whoa whoa whoa whoa. Did you just say our? ‘Cause you said xen… You’re a xennial, you’re not a millennial, Moe. And for the… All of a sudden apparently when it serves your purposes, you will claim the millennial…
51:38 MK: I am technically a xennial. I’m technically a xennial.
51:42 MH: But Moe, you’ve gone to great lengths…
51:45 MK: Well, I don’t think xennial is like an actual definitional thing. I think people… Various labels made it up…
51:51 MH: I believe we have audio evidence where you tried to claim that it was, for the 17 minutes that you were born.
52:00 MK: Oh, anyway, so.
52:01 MH: But anyway, but as we all are all are all here millennials… Keep going Moe.
[chuckle]
52:09 TW: Borderline the greatest generation.
52:12 MK: Anyway, I think it also really resonated for me because as a xennial, I sometimes, I’m working with millennials and it made me have, I don’t know, a different perspective, particularly it’s very focused on the American market. But I think it is an incredible read and I would recommend anyone that works with or knows a millennial a.k.a. The whole world to read this article. So alright, I’m done. Off the third box.
52:37 TW: Nice. I like it. And Adam’s always a good read. So that’s awesome.
52:41 MH: There you go. Alright, so I have not… I have more of a one and a half-er. So we were talking about machine vision and it totally reminded me of sort of a funny thing I saw a little while back. It’s actually been out for a while but there’s this guy named Matt Parker who does comedy routines about spreadsheets and in one of them, he does this whole thing about how he loads an image into an Excel file basically recreating all the pixels of the image. It’s hilarious. Has nothing to do with computer vision necessarily, but it’s pretty funny. And…
53:17 AV: Oh, so you can actually build neural networks in Excel. Just putting that out there.
53:21 MH: Again, but I think your use of the word you might be a little… All of us here as millennials Tim, I think we really need to not limit ourselves.
53:34 AV: One might.
53:35 MH: One might. That’s… Yeah, that’s a topic for another show. A neural network in Excel. That’s really got me thinking.
53:45 AV: Yeah.
53:45 MH: Alright. So my actual last call… That is a funny video, watch if you have time. But my last call is also was why I thought you were about to steal mine Tim, because after our last episode Uber’s lawsuit against their mobile vendor Phunware got posted online and it’s a fascinating read about how ad fraud happens in the real world. And so the company was doing a lot of very interesting shenanigans to milk Uber out of money, which back at that time I think all of us would be okay with that happening. Just given the leadership with the company, and so on and so forth. But it’s probably happening to really great companies and so it’s really interesting to see what’s happening out there. Very interesting read if you’re interested in the ad space at all and how it intersects with people who are out there policing companies. That’s a very good lawsuit to read.
54:42 MH: Okay, you’ve probably been listening and you’ve been thinking to yourself “My gosh, why won’t they finish these last calls?”. Before that, you were probably thinking “Wow, this is super interesting and how can I learn more?” and there’s a lot of really cool stuff. So if people wanted to find you online or see you speak, Ali where could they go?
55:02 TW: Yes, so well, people can check out the ShopRunner engineering blog. That’s pretty new, but there’s some content up there now, including talk that I gave at PyDATA about using Flask to build web apps for data scientists. But also you can see me, if you happen to be in Seattle on October 17th, I’m speaking at the Data Science Salon out there.
55:26 MH: Excellent. Thank you very much. And if you wanna get in touch with us you know how. It is through the Measure Slack, ’cause that’s where we always hang out all the time or on our Twitter account or on our LinkedIn group. Feel free to come aboard there. Alright, thank you Ali. This has been super informative. I think as a topic, it’s been really cool to get a deeper understanding of it or at least a glimpse into an understanding of it which is not your fault. You did… On the neural worth of like the top layer. There are multiple layers of understanding. We’re in the first branch.
56:03 MH: I’m in that base layer of like line, circle, line. Yeah. Well I’m gonna do some transfer learning later and get up to speed here. But no, thank you again so much. Really appreciate you coming on the show.
56:17 AV: Yeah, thanks for having me.
56:19 MH: And for all of us here, we’ve got a great producer in Josh Crowhurst who helps us put the whole show together. So big shoutouts to Josh and for my two co-hosts, Tim Wilson, Moe Kiss. No matter what images your computer is trying to envision, your job is to keep analyzing.
56:41 Announcer: Thanks for listening and don’t forget to join the conversation on Facebook, Twitter, or Measure Slack Group. We welcome your comments and questions. Visit us on the web at analyticshour.io, Facebook.com/analyticshour or at Analytics Hour on Twitter.
57:00 Charles Barkley: So smart guys want to fit in. So they’ve made up a term called Analytic. Analytics don’t work.
57:07 Tom Hammerschmidt: Analytics. Oh my God. What the fuck does that even mean?
57:18 AV: Oh my gosh. Did I explain that okay?
57:21 MH: It’s okay. Even if people don’t get it, they’ll never admit it, because they don’t wanna look like they don’t know what’s going on. Why do you think I didn’t interrupt you 40 times on this recording and be like “Wait, you just said a word I don’t know”. I so was ready to make a vector joke and you just kinda teed it right up. You see the Mona Lisa. I see a string of zeroes and ones in a binary vector.
57:47 MK: Oh God.
[chuckle]
57:49 MH: How is that a joke, Tim?
57:52 TW: It amuses me.
[chuckle]
57:54 MH: Okay.
57:54 AV: That’s how the machine… That’s how Skynet goes to the loop.
58:01 MH: Are you serious right now? I think you’re in a positive mind.
[vocalization]
58:09 MH: I have important show business Moe just one more.
58:12 MK: And Tim was gonna yell at me anyway ’cause I was starting to talk about show content before the show. Sorry.
58:16 TW: That’s okay. It’s part of our whole thing.
[laughter]
58:20 AV: And save it.
58:20 MH: No, we have to do it that way. Moe has to get one or two really awesome questions in before we ever start recording. It’s part of our process.
58:27 MK: And then Tim wants to kill me. Yeah.
58:31 TW: I committed to doing one a day from like a song writing. Song writers come up with hundreds of songs they never record.
58:36 MH: Oh my gosh. You and Lady Gaga, Tim.
58:39 TW: That’s right.
[chuckle]
58:41 AV: He’s the Lady Gaga of fake analytics ads.
58:44 MH: That’s right.
58:45 TW: Hold on. Let me go put up my past mistake. I’m in the deep end…
[music]
58:53 MK: ‘Cause I feel like for lots of people, I’m not gonna lie, I didn’t know it was called computer vision. I just thought it was image stuff and then Tim then started to eye-roll at me. [chuckle]
59:04 MH: Rock flag and sleeve types.
[laughter]
[music]
Subscribe: RSS