Jonas Christensen 2:41
Ari Kaplan, welcome to Leaders of Analytics. It is so good to have you on the show. And you and I, we met probably two or three months ago in person. And since then, we've been dying to get this recording done, because there is so much to talk about in this world of analytics and sports that we're going to talk about today. Welcome to the show, and really happy to have you here today.
Ari Kaplan 3:04
Yeah, it's so nice to be here. It was great to have met you. Going Melbourne, I hadn't been there in like 25 years. There was so much fun. I was there with DataRobot and McLaren for the Grand Prix. Daniel Ricardo, one of the drivers: That's, kind of, his home country. But from there, there was a lot going on in terms of analytics, data science. There's a nice conference. It was a pleasure to meet you there. Also wanted to thank everyone who's listening to the podcast. Appreciate your time.
Jonas Christensen 3:35
Brilliant. Thank you for those kind words, Ari. We are in love with our city, here in Melbourne. And I'm glad you got to go to the Grand Prix, because that is a really special event and time of the year in the city. So let's get to you, Ari. Could you tell us a bit about yourself, your career background and what you do?
Ari Kaplan 3:53
Sure. So, I've had the privilege to work in, what you might call now, artificial intelligence, but really everything data, data analytics for 34 years. The last three plus I've been at DataRobot, which is really fascinating. My role is global AI evangelist. So talking around the world, with different companies: What they're doing, what their challenges are, what the potential is. Also host my own podcast, which is kind of futuristic looking. So, I just love talking to different people. I went to Caltech, which is the California Institute of Technology. They made that TV show ''Big Bang Theory'' based on it, but speaking of geeks, and in a good way,
Jonas Christensen 4:38
Are you one of the original guys from the show? Are you?
Ari Kaplan 4:40
It seems like uncannily all my friends were portrayed in the show. I think it's just a coincidence, but you have the spectrum of personalities. But I feel like that was a lot of my life. But super fun. There's another movie: ''Real Genius'' with Val Kilmer. If you haven't seen it: Huge inspiration of my life. But it's all about learning but, like, pranks and being creative. So, like, three in the morning, people are teaching you to count cards or pick a lock in the safe or superconducting heterodyne detectors. You know, all sorts of fun stuff. They run Jet Propulsion Lab, which just this week. Not fully JPL, but the Web telescope coming out with, like, ''Look into the distant past of galaxies''. So, all the time humanity is learning about the universe. And I try to learn something new everyday myself still.
Jonas Christensen 5:37
Yeah, good on you. Very fascinating and a long career in data science or analytics. 34 years before it was even a discipline, I'd say.
Ari Kaplan 5:46
Yeah, so I just started with the DataRobot last three years. That's kind of what I've been doing. But most of my career has been in sports analytics. So at Caltech, I came up with better ways to evaluate player performance, like sports player performance. Realising that some of it could be locked. Some of it - the numbers are ascribe to how others perform, and not how the individual performs. And instead of just complaining about it, - that everyone did - came up with my own system. So to this day, anytime you see the letter X for expected, like, expected goals, or expected wins, it's kind of the paradigm of my set. But then I just had to keep reinventing myself every three or four years. It's kind of another theme of my life. It's like, once you're onto something that has value, a lot of people in companies either copy or duplicate or see the value and do it themselves. And you can stay with that and have, like, a steady life, but my personality is: I don't like being in the space that is kind of mature. I like being in practical, but what's the next thing? So that's one thing. I had to reinvent myself every few years. And now in sports - really in every industry, but certainly in sports. - It's ''How do you progress using artificial intelligence to complement what humans are doing on the field, off the field''. So that's kind of been my progression. I also spent time with Oracle Corporation and was president of the worldwide Oracle user group and dabbled in entrepreneurism and started some companies that got series A and Series B, and one of the first mobile business software companies before the iPhone existed, which was pretty fun. But yeah, it all kind of had that central theme back to sports and sports analytics.
Jonas Christensen 7:44
And doing the research on your background, Ari, for this podcast, I can tell you it was very fascinating. And there's so many different directions we could go in, because you have done so many things. We have only about an hour to do this, though. So we won't be going through every 34 years of your career. But let's dive into the sports analytics because that is really fascinating and a bit of an unknown thing to many people, what actually sits underneath. The first question I have for you is: You're sort of known as the real Moneyball guy and Moneyball is a book and a movie with Brad Pitt and Jonah Hill, if I'm not mistaking. This case, you would be not Brad Pitt, but Jonah Hill, I assume. Which is about baseball, and how one particular team use data to really go from bottom to top, smart way and on a low budget. You are known as the real Moneyball guy. Could you tell us where that nickname comes from and what sits behind all of that?
Ari Kaplan 8:43
Sure. So the movie and the book Moneyball. I would say the book, the events - this is now the 20th anniversary. So first of all, I'm impressed and a little surprised at how long-staying that book and that story is. Like, you still see it on aeroplanes when I fly and people say ''Oh, it was on television the other night''. So it's remarkable of all the tons of great movies that have been produced, that seems to resonate. And it resonates: People trying to change an old way of doing something, in this case, sports, and using data, using smarts to try to improve and then it's a combination of that. And then cultural resistance to change, which really, - I think everyone can relate to in one way or another. Sometimes when you try to be progressive, or successful one way or another, there are people that are resistant to change. And it's been like that, like all the time, every single industry. One of my friends helped start the automated teller machine and he got a lot of resistance. You know, people want to go to the bank. They want to deal with humans. And you're like ''At midnight, the banks are closed''. And then with global software, I would talk with public companies and they would tell me ''Ari, we will never check our email from a mobile device. We'll always wait until we get home''. And I'm like ''You're going to be checking email, while you're waiting in line at the grocery store.'' and they just couldn't believe it. So that those are reasons why Moneyball kind of resonated. I started out back in the 1980s, just as a teenager, doing, like, what you would call analytics. And from what I've been told, - people hear otherwise, let me know, but - I'm one of the first three people known to have been employed by any sports franchise, to do data analytics, to use data to do things like predicting how a player might perform. What players should they sign? How much money should they be valued? What are the strengths and habits and weaknesses of the player in the game? So that kind of was, like, the original Moneyball thought. And then the book was kind of based on, like, the Oakland A. And there, there's this whole story of how the original person who was based on, - like, had to be dropped out of the movie set. So the producer had heard of me, and, you know, had contacted me to see what it's like to be an analyst/data scientist in the world of sports. So, I had no idea how that movie would change my life. It's still pretty wild, having been in it. The people that are in the movie, the real life folks, like, I've worked with, in real life, and I just view it as kind of normal. It's almost like if your kid was in a movie. It's just like your normal life. So, I kind of find that fascinating. But I'm glad that it inspires people with that storyline, and I hope it continues to inspire. And one of the phrases ''Adapt or die'' might be a bit harsh but that's kind of what the world is like. Now, the world is changing so quickly, that businesses have to change their understanding of their data faster and faster. In some cases, survive. But you know, certainly to be more profitable, or to be more relevant to your customers.
Jonas Christensen 12:08
And, Ari, the reason you got pulled into that movie, because you were the real Moneyball guy. I assume very much based on the fact that - and here is a very quick baseball lesson for all of those listeners who are not into baseball - that you let the analytics function, if I may call it that. You can tell us a little bit more about what that function actually was at the Chicago Cubs when they won the World Series in 2016. And that was the first time in 108 years that that team won the World Series. So, there ever was a drop in winning titles that would be close to the record. And you were part of that with data and with analytics, achieving that result. Could you tell us about how you were using data at that time to become the real Moneyball team?
Ari Kaplan 13:00
Yeah. So, first of all, 108 years, I believe is the biggest drought. Like, lack of championship for any sport in the world. If not, you know, let me know. That's what I've been told. And then 108 years, coincidentally, is the number of seams or like number of stitches in the baseball. So that's kind of a cool symbolism there. So this is 2016, I had started consulting with the Chicago Cubs back in 1995. Like, automating their scouting analytics, and then helped the president Andy MacPhail, in addition to analytics, becoming a major league scout. So the two lenses in the movie, I've kind of sat in both. The analytic: Like, what can numbers tell us about a player's habits. But then what can observing the player also tell you and where did they complement each other? Where did the contradict? So also had that role helping out with scouting, and then the fun part. So that was starting in 95 and then the team got a new ownership, the Ricketts family, around 2010. So they made two hires myself, and then Theo Epstein, who was, kind of, portrayed in the opening scene of Moneyball. But I helped recruit him to become the next president. So those are the two hires in the organisation. You know, we had an open field, where as an owner whose family started TD Ameritrade, which is like a financial company. It was kind of like you have an open realm to make recommendations. What data do you want to collect? What software do you want to implement? What insights do you want? What type of questions you want to ask? It was also at a time when, like, pitch tracking and tracking of what players are doing started getting introduced really in the late 2008 - 2010. So it was like the perfect storm. You had the opportunity to shape something that you liked, as well as new data coming in. So it was a process to - fortunately, we're in one of the big cities, so you have the opportunity to acquire good players, trade for good players, get equipment, to help with their development. Like high speed cameras, and so on. So really, the goal was to foster a naturally curious environment, and try to avoid, what's called groupthink. Where the head comes up with an idea and then ''Hey, I want this player. I want Jonas.'' and everyone else scrambles to come up with evidence of why the leader came up with the right idea. And then just kind of grew from there. And the Cubs was a great success. I also helped scout some of the players. In fact, Anthony Rizzo, I did the main analytics. He ended up being the player that caught the final OUT of the final game. And as soon as the ball touched the glove, they became world champions. So I was the key analyst to recommend him back when he was a teenager. Gave a strong recommendation, saying he would be a consistent all star, which is a rare thing to do. I usually don't stick my neck out. But yeah, that's overall what the real life at the high level was like. And then I left the Cubs and joined one of my old friends, Dan Ducat,who also was part of the book and I think in the movie, but certainly in the book ''Moneyball''. We worked together with the Montreal Expos, which was kind of the opposite. We had no money. We had the lowest payroll in baseball. But we ended up with the best record in baseball, by finding completely undervalued high risk players. So again, our entire 40 player payroll was equivalent to one or two Yankee players. And we still had the best record and made the playoffs. So we reunited with the Orioles. I was his assistant. He was the general manager and had a lot of great experiences there, making the playoffs, three of the first six years.
Jonas Christensen 17:19
It's fascinating when you tell the story. How much of a difference it actually can make, when you pick the right players. Not out of what I assume normally is what humans pick up on. Charisma, star factor, whatever that means. Things that they see with their eyes that are actually massive biases. And I would hazardly guess that a lot of people listening to this podcast, are not hiring for baseball players, but they are hiring for other individuals. And we will have the same biases. In that case, it's a little bit harder to measure, who's going to be the next CEO or your next general manager of such and such. Of course, because you don't have metrics of how good they are, in that sense. But it really highlights just how powerful analytics can be in an environment that is so competitive and where everything is: In order for you to enter the top, someone else has to not end up at the top. It's a zero-sum game in the league table, in the playoffs or whatever you have, sports finals, right? For there to be a winner, there has to be a loser. So, you have to be better than someone that and it's very measurable. Yeah, that's really fascinating. So Ari, caught on to the gentleman you talked about earlier, that grab that last ball. And you had picked him out of the crowd of thousands, many, many years before that. You probably don't want to reveal your full strategy here on the show, of how you pick the best players. But what are the things that you look at? And you can see in someone like that, who is a teenager who is maybe 10 years away from that, and not physically developed or the mindset and all those things still not there. But you can see that they're going to be that person. That's really fascinating. What are the variables and metrics that you pick out? To determine that this person is going to be there in the future, with a high probability?
Ari Kaplan 19:21
Yes, a great question. And that's kind of the key to every industry. It's at the point that you're making a decision: What variables or what criteria at that point, kind of, are indications for future success? Could be your marketing program? What price you want to set something? It could be ''What's our demand forecasting for manufacturing some products, on a store by store basis?''. But looking backwards in time, the answer is right there. But at that moment, especially when you're dealing with a teenager, you have to fast forward. What will they be like when they're 30 years old and how might they perform? At a high level, you started talking about biases. You want to do analytics that are as little biassed as possible. So like, one bias could be like, if you've ever heard of auditions or you've ever done a job interview: the first person has an advantage. People come and they go ''Wow, Jonas was amazing. I'm not going to really select anyone else. He's our candidate.'' or like the last person or second to last. And most of the people in between, it's just human nature. So there's recency bias. There's bias, where if, like, a player or a person were to keep calling you. Like, persistence or confidence pays off. But that isn't necessarily evaluation. So, there's a saying that you select players, you pay for players, based on future performance, not in the past. So, Anthony Rizzo was one famous one. Another was Hunter Pence, who was a minor leaguer, half the scouts. He's a skinny guy, not very athletic looking. So half the scout said ''Believe me. He's amazing. We should recruit him''. And the other half said ''No way. He doesn't have the body of a player. And he's not going to develop''. So, of course, the owner looks at me and says ''ArI, you're the deciding vote''. And you know, my answer, sound like a lawyer or real estate agent. But it was ''It depends. Let me do the research and get back, since this is such an important decision''. And I spent, like, 40 hours in a week researching everything I could. So back to, like, your criteria. And you know, every player is a little bit different. Like, for example, Hunter Pence and Anthony Rizzo, those two examples. When you're very young, and you're successful, it's about how do you adjust as a player in baseball, in cricket and other sports. It's so competitive, that people are going to find your weakness and attack your weakness. And if you can't adjust as a human, you're not going to last very long. Unless you're like, being so incredibly skilled that your strength is overcoming any attack that they have. So I look at, for example, every time they face the same opponent, who's getting better the opponent or those players? And how does that compare to their peers. So both of them were incredibly adaptable. They would adapt their swing based on learning. They would listen to their coaches to make adoptions, and then other players that I wouldn't recommend that weren't making adjustments, sometimes they would get called anyway. But after one or two years, they'd have success, but then just fail out of baseball altogether. Those are some things but then in terms of data science, one of the magic and beauty of data sciences is: You can try all different things. What colour eyes does the player have? And artificial intelligence will probably say ''You know what? The eye colour has no bearing on how successful they are''. Then you have a hunch ''Hey, did they grow up in a sunny state or a northern state where it's cold a lot''. And the data will reveal that overall, people that grew up in sunny states do better. And then you can try to understand: Hey, that since they had more access to practice, all winter, compared to others. So that's the beauty of data science. You put in as many factors as you can collect. Features, variables, whatever you want to call it, and AI will help determine which factors are most indicative. That's kind of what I would do with these players. It's put in as much information as I could. Look to see what other players did throughout history. And then see at the point, you're making that decision: They're 17 years old. What characteristics that you can collect are indications that they'll have success in the future? And then above and beyond that, you do have to look at the player as a human. One of my biggest mentors was the a guy: Jerry Krause, who's most known for being Michael Jordan's boss, and Scottie Pippen and Dennis Rodman in the NBA. So, he lived in Chicago. We went to hundreds upon hundreds of games together and dinners, and he would teach me a lot about how you project a future human ability, based on what you see in the present. Which it's hard for me to speak being a scientist, but as much as I can learn. You know, there are some patterns that you can see if you adjust this thing in their delivery, they could be an even better player. Those are some of the techniques that I've used.
Jonas Christensen 24:52
And that would be something that could be universally applied outside sports as well, I assume. As in this, sort of, the ability to pull levers in human ability and adaptability. Because some of the keywords I picked up on that you're talking about are the individual's ability to be moulded over time. It's really key, which is another way of saying that they're adapted to their environment and the stimulus that comes in. You also talked about the externalities that are actually not to do with the person themselves, but just these ancillary factors, like where they live, whether they live near a good sports ground, or bad's sports grant. We're all in love with these stories of the people who defied the odds with poor training facilities. They didn't have a real coach. Their dad did it, like, the Serena Williams and Venus Williams movie with Will Smith. That's another sports movie, where they get trained in the local tennis court in Compton by him for years and years, and years and years and years. They do also get into a nice facility at some point, of course, and it really takes off. But all of these ancillary factors, it's also fascinating. With all that, Ari, two questions come to mind. Generally speaking, not just in sports, but generally speaking: What are some key indicators of this human potential factor, the adaptability factor that someone can look for when they look for talent in any sort of branch of human ability? Big question. And secondly, how much - you can put a rough number on it, if you if you're willing to - how much weight do you put on the individual versus all these externalities, typically, in decision-making?
Ari Kaplan 26:42
So yeah, great questions. Yeah, that does apply to every industry. You know, adaptability of people, human behaviour. Like, when you look at retail or manufacturing or even marketing, people act differently when they're hungry. So, pricing a product for people in the store is not always a math formula. Or it could be generally applied to a math formula, but accelerated when people seem to be hungry. Or like band aids. Just in general, you're going to buy a band aid, just you know, to stock in your house. But if your kid fell off a bike and is bleeding, you're gonna pay 2,3,4 times as much money. So human nature is one big factor. And the other thing that makes it tricky is: Data science could be very good for coming up with, like, generalisation. It's generally better if you're a taller pitcher, since you're going to get more torque, and throw a pitch faster, or be able to start at a high plane and throw downward. So there were teams when this information came out, that told their scouts, like, ''Don't scout anyone that's under 5'10''. But the problem with that is, there are outliers. There are shorter players, that - you know, like you mentioned, the Williams sisters, - doesn't matter how tall or whatever you have, what background they're from, they're outliers. They're great. The first guideline is not to generalise. Some metrics are better for some people than others. In baseball, there's some players that stand like this. I know it's a podcast, I don't if you can see. But put their hands upraise, others with their hands down, others with their hands back. And so, there it's like really a combination of characteristics. It's good if you can stand with your hand backward, if you have quick bat speed. It's better if your muscles are quick, but not as strong. It might be better to start with your hand closer to the plate, if you're really strong, but don't have as good speed. So, you start to develop, in addition to models, what's called an expert system. It's kind of like business logic. If this is the situation, you're this age, this height, then start doing this model. You're in this age and height, do another model. And then you keep refining it. If you're left handed, right handed? Do you have so much experience or not? What type of position you have. And then by then it's like less a math formula and more like an expert system. It's business rules. That's more finely tuned. So those are some things and then, you know, there is something to be said, if you're selecting players. To understand who they are as a person. Though, the one thing that is really hard to do is - I've talked to such a range of players. Some love analytics, some love analytics if it's visual. Some love analytics if it's like a word or phrase. Some love analytics, if it's like raw numbers, and others don't even like research. They don't even watch a video of their opponent. They just go up and swing and they're still very successful. It's kind of hard but what you could do is make a model based on comparative players. And then you can kind of say ''This is the type of player...'' - You have the 10,000 high schoolers. You could make a prediction. And then, you know, ''This is the type of player would believe there'll be. Here are the reasons for or against it, based on the data''. And then ''Here's like the risk. Here's the expected accuracy. We think Jonas is going to be an elite player with, like, a 50% chance, or he's going to be an elite player with a 1% chance. And then that just helps drive your decisions. The other thing that's fascinating, is like a newer trend. So you have metrics. How quickly did somebody run? How fast are they swinging a bat or throwing a ball? But then you have scouts that are humans, evaluating the player, and writing in text, a scouting report. And now AI could take the actual words, that's the subjective information that scouts have, and use that as a data source. It's called Text Analytics. So if enough scouts say this player is deceptive, and the metric show they can throw a pitch 98 miles an hour, that combination is, like, the best combination for success. And you wouldn't get that information if you don't have humans evaluating it using words. So I find that fascinating. You know, based on scouting reports, depending if your high school, college or below. But that is, like, - what the scouts say, tends to be one of the top three criteria to predict success, which is fascinating.
Jonas Christensen 31:44
So there we go. No one's getting replaced by AI yet. Not scouts, either. That's the usual fear. That all these comes in, and we don't need humans anymore. We absolutely need them. That example there is really interesting, because the way I think about that is you have content and you have contexts. The raw metrics: the ball flew at this speed, they hit it 80% of the time, they stood like this, etc, etc. That's factual. That is what happened. The context around that situation is what you're seeing, is what the scout actually adds. And they can be biassed, of course, and hopefully they're not too biassed. But that's something you have to deal with as well, I assume. The context around that situation: What are the typical, sort of, important contextual, additional observations that a person watching this scenario play out, that they can add? On top of the role matrix?
Ari Kaplan 32:39
Yeah, that's great. Putting things in context is great in every industry. So, like, in terms of marketing. Sometimes you do a marketing campaign, but people are going to buy the product anyway. But you might incorrectly attribute the marketing campaign with the sales. They would buy it anyway, or close the sale. And similarly, you know, in sports, sometimes, like, the context is everything. You know, an example is: if the game is out of hand, like you're obviously losing or you're obviously winning, people play differently. Like, they're not trying as hard or they're trying too hard. It's kind of like, throw away. They're just go through the motions. So, you probably shouldn't evaluate the statistics from those scenarios. It's called the high leverage. How does somebody perform if like, the game is closed and on the line? Are they able to still concentrate? If a player made an error, do they bounce back and continue to do well? Or do they continue to kind of hurt themselves? It's this whole book called ''The Hot Hand''. It's this whole concept, philosophical and physical in basketball, where if you have a hot hand, you make baskets. In that game, you continue to make really good shots. And some analysts say that that is a thing and other analysts say it's not. And it all depends on the context of how you do the analytics. In reality, you know, the conclusion is: It does exist. The human body could get tired and their arm could be worn down. But yeah, those are some of the contexts to how high leverages is it. Sometimes, you know, there's the scientific with light as a particle or a wave and it depends how you observe it. Sometimes players know they're being watched by a scout. And so they perform differently. Like, what you said, it could be showy. They could dive for a ball, even though they didn't have to, and they catch it and they pump their fists, knowing they're being watched by scouts. So yeah, you want to just be unbiased and see what are their skills compared to, you know, what a typical player would do in that situation?
Jonas Christensen 34:49
Yeah, really, really interesting and fascinating. So, Ari, I actually want to just take a step back a bit because we've gone into detail here, which is wonderful, and I think it will actually would be helpful for listeners to get a bit of a history lesson because you've been in this industry for 34 years. And you talked a little bit about the technological evolution that's enabled a lot of this stuff over time. Could you talk us through what types of analytics you could do in the beginning to now and sort of the technology that's come along, and the data that's come along, and the maturity of that space? Because you see now, so much data. When you're watching sports on TV, you see all this stuff getting shown to you, which shows you that a lot of stuffs getting measured. There's even more than that, of course, but also the proliferation now of almost every professional sports team having a head of analytics, or data science, or whatever they call it. It's so common now. So this must really be something that's driving a competitive edge for a lot of teams. Could you tell us about that evolution over the last three, four decades?
Ari Kaplan 35:58
Yeah. So it's remarkable. I don't feel that old, but been in it over a third of a century. And yeah, when I started in the 80s, there was no Internet. People really didn't have emails. It existed, but most people didn't. But the sports data was, what you call box scores. It was basically what was printed in the newspaper. It's kind of a summary of what events each player did. Like, one line, couple data points. So I would have to go to the library and look up things on microfilm and put it into a database I created, to find information. So there was like, within a game, what happened and then sometimes, it would be the sequence of how things happen, that would help me explain the information better. So this is like the late 80s to early 90s. It was just event information. One row or one record per player per game. And that's kind of where things stood. I'd worked for Oracle. I'd worked with the teams. So I would say the next progression was in the early 90s, where there were scouting reports that were all written on a piece of paper. You know, at the end of it, every year, there'd be thousands of pieces of paper. And if a general manager wanted to say ''Who's the best shortstop in the minor leagues?'', they would have an intern look through thousands of records, pull out shortstops and then order it by the scouting records. It would take a week or so. So, right for making a database. So, the scouting reports can be electronically entered. And then I made an interface for a non technical person, like a general manager to just say ''Shortstop. Minor league. Sort by this criteria''. And the answer comes back in seconds. So that takes us through the 90s. And then, you know, the next revolution was like in the late 90s, early 2000s, where volunteers started making game logs. So before you only had one record pregame. How many hits did they get? How many runs did they get? And now a game log would be: What was the sequence of events? And then after that, you started getting what's called a Pitch Log. For each event, there could be 5,10,20 pitches. And then the first true data revolution started in 2007, where this company called Sport Vision, invented technology to use cameras and sensors in the stadium to record everything happening on the field.
Jonas Christensen 38:30
Sorry, question here. So until 2007, is this all manually collected? As in a human is sitting there typing in?
Ari Kaplan 38:38
Yep.
Jonas Christensen 38:39
So many pitches, and etc, etc, so many runs. I mean, you track that anyway, as part of keeping your score, but this sort of extra detail?
Ari Kaplan 38:48
Exactly, extra detail. So each team would hire scouts to go to every game. So you'd have 20-30 scouts at every game, recording redundant information. ''Here's each pitch. Here's where the pitch was. Here's the speed of the pitch''. Humans capturing that. And now in 2007, that was the first automation. So, like in that movie ''Moneyball'', scouts, some of them would say ''Wait a minute. It's my job to record every velocity of a pitch, every speed. You can't automate that''. And so the bad scouts would resist, but the good scouts would love it. They said ''This is great. This recording of velocity is kind of beneath me. I don't want to do that. I want to evaluate players based on their human aspect''. That was the first thing. Now you have every pitch of every game. Not just that, but things that humans couldn't record, like how quickly the picture was spinning. A human can't detect that, but the cameras can. So that was a huge revolution. I realised, you know, I only have a few years to have a huge advantage. Started this company, Scoutible, with Fred Claire and others, the general managers of the Dodgers at the time. And yeah, we kind of helped revolutionise that whole analysis. Finding habits in players, that put teams that used it, years ahead. And so this pitch tracking was about the same until like five years later, when they started recording similar things for hitters, and for fielders. So, around 2010, when I was with the Cubs, one of the big advantages, was being able to position where the fielders were to anticipate, like, ''What's the probability of a ball being hit?''. In this area is 20%. But another area, it's 80%. So you're gonna have your fielders stand where it's an 80%, chance versus 20%. Did not cost any money, it was pure data. And we were using it and we estimated that it would save 80 to 90 runs per year, which was about eight or nine games per year, which is about 80 to $90 million of value per year. Significant amount. That's like, make the playoffs or not, without costing anything else, without getting any other players. It's like Darwin Barney 1. It's called the Gold Glove of the year. Think he's Australian, by the way. But you know, for nothing else, but having data to help inform where to stand.
Jonas Christensen 39:08
Fascinating.
Ari Kaplan 39:09
And then I would say the final two phases was just a few years ago. Baseball introduced new technology called Hawkeye. Which if you're a global sports fan, that's being installed in the Premier League, across Europe, and in other sports. And that really measures everything go alone in the field, including the lens of each player. So if you're talking about football, soccer: What is their dribble to their footwork like? Are they using their head? How are they diving and striking? And that is only a few years old. So now we have this whole host of new data, that's going to take a decade to fully interpret and understand. So that's where we are now. Using AI and cameras to interpret, like, where limbs are on the field? And how can you better make a performance? Faster pitch, faster throw or faster kick.
Jonas Christensen 42:17
There's so much in this that we can transfer to the business world as well. And one of the things I'm sitting here thinking, Ari, is - if I may sort of go across a little bit to a more high level philosophical comment. So in the data science literature at the moment, there's a lot of talk about data centricity, data centric AI and machine learning, which is basically saying that you have more opportunity in collecting better data than you have in trying to make the model, the algorithm more accurate. And when I hear you talk about this, the evolution and revolution of this is really happening when you start collecting some really incredible, accurate, precise, but also very particular, very insightful types of data, like the spin of the ball and those sorts of things. How limbs moves. It's a completely different metrification - if that's a word - of human behaviour, that just opens up the door to such a different fundamental understanding of how we work as individuals and how we fit into a larger group, which is actually quite complex. Humans, we think we're smart at that stuff. But we actually, probably not wired to look at 10-15 people, 2x, running around in one big mix and picking up exactly what's going on. And your example of ''Let's make $80 million by taking one step to the left'' is also fascinating, because there are business examples of that in almost every business. There's sort of hidden, hidden value somewhere that you just don't know until you start measuring. So the data collection and spending the money to collect the right data, basically, which a lot of organisations outside sports probably don't do, is really key to opening up all this stuff. That's my philosophical reflection on that. So you've talked a bit about what kind of data is collected and how it's used. You started talking here about this new set of data that will take 10 years to really, fully utilise them. Where do you see sports analytics being in say, 10 years? What are the things we'll be able to do and measure and therefore what are the outcomes that that will drive? And I know every sport is different and I'm sorry if it's a very generalised comment. So, you feel free to answer it however you want. With one sport or many sports or whatever you choose.
Ari Kaplan 44:51
Yeah, there's so many different ways to look at it. One is: Traditional analytics was business intelligence. It was looking at data that was largely like numerical and then, like, categorical. Like, the colour is red, white or blue. The price of your product is two Australian dollars or 2,50. But now with AI, you can do more unstructured data. So the text analysis, like scouting reports or injury information or online, like, sentiment analysis. That's a new type of data. Using Visual Analytics to look at images or video. To look at time series for, like, demand forecasting. To look at geospatial to see, like, are there some geographies better than others? Or in sports? Looking at the field as a geospatial unit: Like, how do formations of humans, running after this ball made out of pigskin - What is the strategy of that? So, like the types of data are key. Another evolution is the data being collected or XYZ points, like of lens and where a ball is. And that's kind of cold and clinical. So having humans part of the conversation, to take these dots and put context and meaning around it. To say ''If these dots are in this pattern, it means a player's leaning back or leaning forward''. So, that's called Feature engineering. A shape of dots can be translated to a new variable. Then you can take multiple variables and make a new variable. Like, an open formation, where the player is open to pass or very open to pass, or close to pass, or aggressive dribbling or defensive dribbling. And then you can take those and make new features like building blocks. Make an entire play like a right striking, you know, tail goal attempt. And then you can just keep building upon that. In what context? When it's a tie game. When, you're down by a point, and you're in a desperation shot. So these are all like, for me the fun creative part of what the next decade will be, which is feature engineering or making common sense ways to interpret the data, based on raw data. So those are some of the things. Then when I said, like, this Hawkeye, this tracking data will be helpful for 10 years, in that some of the questions are very immediate. You can answer. What are tendencies players do? That can be answered now. But since the state is only one or two years old, we don't know yet how a 17 year old, like, how quickly they run or with this data, like, how their age will peak or not, until we get 5,10,15 years of data. So it's almost like a lifecycle. From the moment, you're being scouted, which is a teenager, until the moment that you retire, like 15 years. You need years of data to answer those types of questions, like: How do people age? What types of performance skills are retained and what less and by what amount? And then, you know, the world tends to take different turns. So in baseball, that example of defensive positioning. When I started, less than 2% of all balls in play, incorporated that: Moving the player. And in just a dozen years now, more than half of all balls play incorporated. So the game has changed. And now you have to make new data science to adapt to the changing rules, the changing way the game is. In basketball, the game has totally changed from two point shots, where you're right next to the basket to three point shots. So these models are going to have to change again and again. The rules are going to change. And then who knows there might be more types of data that can be collected, that we're not even thinking about yet.
Jonas Christensen 48:47
Yeah, the two to three point shot element of basketball was actually - I think I've read about that somewhere, that was their data analysis data scientists sort of lead outcome - that there is some probabilistic relationship between taking three shots versus two years. The game outcome is more likely to be in your favour, If you have more of those with the right students, of course. Is that right?
Ari Kaplan 49:11
100%. And, you know, it's all within context. Some players like Shaquille O'Neal. His best strategy is: Stand under the basket, pass it to him and he just slams in. But for most players, yeah, that is the case. So the whole game has changed to like either a three point shot game, or they run and charge the net and do a layup, right under the net. And so that totally changed the way players perform in high school and college, knowing that they have to improve their three point shot. And now there's even talk of possibly making an extra layer, so there's now a four point shot rage. Hasn't happened yet. But that's how much the game has changed. They may even change the rules so dramatically like that.
Jonas Christensen 49:57
Wow. That is fascinating because the selection criteria becomes different. And a lot of variables that might have mattered in the past, I can imagine, will become less important. I'm here picking at something that I don't know at all, is a driver. But I would imagine it is someone like Shaquille o' Neal: very tall guy, even for basketball. So if he's close to the basket, then he can sort of put it in. It probably matters a lot more than the guy who's a little bit shorter, but can get it in from far away. So, the height advantage becomes very different in that scenario, as an example. Fascinating, Ari, all this stuff. There is so much stuff here. We could we could keep going on different sports and different lenses to all this. And it is fascinating, in part because it is measuring humans and their ability. But also because it is actually so much more advanced than almost any other field of analytics. Because of this proliferation of high quality data. This my opinion, at least that's the reason in part. There's also lots of money involved in gaining that extra inch of performance that makes all the difference. Whether you win by 10 points, or 1 point, the outcomes the same, in essence. So, that competitive edge is really important.
Hi there, dear listener. I just want to quickly let you know that I have recently published a book with six other authors, called ''Demystifying AI for the Enterprise, A Playbook for Digital Transformation. If you'd like to learn more about the book, then head over to www.leadersofanalytics.com/ai. Now back to the show.
And on that, I'm interested in your viewpoint on what can the business world learn from the sports world in terms of using analytics data science to get a competitive edge?
Ari Kaplan 51:52
Well, yeah, there's so many things that could be learned. To operate better, to serve customers better, and then ultimately, to get a competitive edge. And at the highest level, it's cultural. Like, a lot of companies have been around 100 years. Kind of like ''Moneyball'', everyone's like ''We've been doing it this way forever. Why change?'' and a lot of businesses have that mentality. And I'm not advocating change, since it's cool and different to use AI. Like, you should really only change if the new way is better than what you're doing in the past. So, just want to make that clear. But AI can help you determine - no, here's an insight. If you do AI with transparency, you could explain ''Here's a recommendation. Stop selling your product in Walmart''. And if you tell that to a retailer, or manufacturer, they may say ''I'm not going to change my strategy, just since a model told me''. But if there's transparency to say ''Here's the 10 reasons why we recommend it. You know, the smaller shop has the same volume for more profit, and it's in the general same neighbourhood'', then they might make the switch and make more profit. Similar in baseball or sports. If a model says we recommend signing Michael Jordan, or like another player, the general manager won't just take it and sign the player. They'll ask ''What are the reasons for or against it?''. So transparency, and being able to interpret and understand what data science models are, really a translates to every single industry. I know myself, I'm very data intensive. But even so, I'm not going to change what I'm doing unless there's like some good evidence, like in the model. Like if the model tells me to change the price of DataRobot software, probably not going to unless I can see the data to back it up. So transparency is good one. Understanding where humans and automation augment or help each other out. So what can human evaluation tell you? Versus data evaluation? And how do they collaborate together? So probably, like the best explanation is, when you make a Data Science Initiative, is to have the data scientists and the business people constantly talk and communicate. People that know the data and people that know the business, combined with the data scientists who can kind of help the magic, help make it happen, or become a reality. So, you know, like in that retail example: If the model says ''Stop selling in Walmart'', but the business people say ''Wait a minute. We have a five year contract with Walmart. We can't stop selling there''. That's something that may not be in the model. Another example in sports. I once had somebody work for me that recommended the team sign a player, and it turned out that that player was injured. They broke their leg. They're out for a year, and luckily I caught it. Since if you recommend to sign a player with a broken leg, that's not the best recommendation.And the data scientist just didn't think to check, like, add injury information as part of the model. Things like that, you really need to have that collaboration of business people, data people, and data scientists. There's probably 100 other points from sports world, that translate to business. You know, there's the technical. The other question is like analysis by paralysis and that there's the saying that ''No model is perfect, but some are more useful than others or better than others''. So at some point, you have to decide ''This model is good enough. It's 90% accurate. I'm just gonna run with it'' or other cases: You know, you're launching a spaceship. It has to be close to 100% accurate. Will the spaceship get into orbit or not? So at some point, you also have to decide when is it good enough. And then I guess the final advice for now - we can keep going - is to understand what's called data drift. And that in the real world, things are changing, the interest rates are fluctuating, prices of commodities and oil and gas are changing. Supply chains being disrupted. The economy of what people could spend are changing. So you want to constantly monitor your data science models, and recalibrate them, when the underlying data, the world's changed enough. And that's one thing I learned with sports as well is: You can have a great model but then a month later or a year later, players change their behaviour. The market changes. There might be more left handed pitchers available than last year. So the the price of them might come down. So the bottom line is: Be able to detect where's your data drifting or changing? How's that affecting your data science model? And how can you make a repeatable process to, like, recalibrate, or remodel your algorithms?
There is a lot for the business world to learn here. And they're all the same problems in a different version, really. What I take away, Ari, is two things that really are often showstoppers for a lot of businesses that the sports world has overcome, in many cases, which is quality of data. And I've already talked about that at length, but also the culture around experimentation and wanting to use these insights and trusting them. That is one of the biggest challenges that many businesses have. This cultural change around this stuff, being actually a silver bullet in many cases, and being able and willing to play around with it and experiment. And see, see how it works. We're almost at the end. I've got three questions left for you. Two short ones, as long as you want to make the answer. Because if you are a follower of Ari on LinkedIn, which I am, you will see that he has been travelling the world with the DataRobot team, looking at almost every Formula 1 race this year, because you and the DataRobot team have been doing some pretty interesting things in that space. Helping Formula 1 teams and one in particular become better. Again, competitive edge. Could you sort of high level talk about what you're doing in that space? And maybe what sort of results that's yielded?
Yeah, absolutely. So working with McLaren, which is one of the like, better, like, respected Formula 1 teams. Been doing that almost a year, now. It's been incredibly exciting, getting to travel around the world. We're both a sponsor of McLaren. So, you know, seeing our logo on the car is amazing. You know, being part of some of their social media, where they have 10 million Instagram followers and 10 million Twitter followers and half a million on LinkedIn is great. But the other fun thing is that they leverage DataRobot, and I'm one of the folks, you know, helping them with, for example, race strategy. When do you change your tires and pit, which is like the key point to the race? What is the weather going to be like? The surface temperature of the track, the air temperature of the track to, like, a five minute increment. And, you know, as long as you're better than what the weather station is, - and we have been - that gives better insights. So I'll call Ed Green, who - I don't like to come up with my own metrics. I'd rather have them, - but said, like, in a recent race, it helped the team stay out on the track three additional laps on the same tires, which is a huge difference. And they ended up getting what you call points, which helped them tremendously. So being better able to predict things like that and the race strategy has been tremendous. In Silverstone, in England the other week, before the race, getting into their motor trailer and go to their technology centre, which is this huge, futuristic building in Woking, England, where they have dozens of people every race day. Like NASA spaceships. Like, they're in this control room. One person for just the front left tire. Another for, like, the rear aerodynamics. They have wind tunnels to use AI to help decide what parts should be tested in the wind tunnel. You have 80,000 car parts. And you can only test a few per day. So what objects you want to test to give the most likely improvement in aerodynamics? I'm looking at past information, of like the prior race and seeing. Did the car perform as you had hoped or not? Why or why not? And you only have a few days to make tweaks to the car before you're racing again. So, so data intensive, where milliseconds count. Millions of simulations. Tens of Thousands of car parts. 300 sensors a thousand times a second. So it's so data intensive. It's so high visible. Just from that TV show ''Drive to Survive'' on Netflix, they gained 73 million fans over the last couple of years. So high stakes, high visibility, and with that great people to work with that McLaren.
Jonas Christensen 1:01:16
Very fascinating and in that sport, literally every second counts, so any millisecond you can gain is a big difference. I do also recommend that everyone watches that ''Drive to Survive''. It's a fascinating behind the scenes of Formula 1, which show some some of this complexity that's being put together. Ari, last two questions for you. So the first one is one that I always ask people on the show, which is to pay it forward. I want to know from you, who would you like to see as the next guest on Leaders of Analytics and why?
Ari Kaplan 1:01:52
Oh, that's a great question. By the way to answer that, I want to congratulate you and all the great guests that you've had before. So the folks I know from Bill Inmon, Tom Davenport, Ravit Jain, Kaite Strachnyi, Dr. Kirk Borne, who also went to Caltech, but some folks I didn't see on there. Number one, I would say Michael Wimmer is an up and coming. He's super young. I don't know if he's like 13 or 14, but, like, super creative. Has helped NASA. Big Formula 1 fan, but he is like a child prodigy. That's the next generation. I always want to hear from the next generation. Hanna Fry, out in London. She's done a TED talk on the mathematics of love. Kind of a fascinating popular data personality. Michael Kanaan, who wrote this book ''T-Minus AI'' which is kind of a futuristic, like: What is the end of civilization? Are we going to survive as humans with, you know, the clock's ticking on the negative and positive. And then probably Ikechi Okoronkwo, who is with Mindshare. I just recently spoke with him. But it's kind of similar subjects, but more in marketing analytics of similar subject of how humans behave to advertisements. How they resonate to personalities and hyper customised marketing. So probably, you know, those four would be good to start with.
Jonas Christensen 1:03:25
Fantastic recommendations. And everyone on that list will be getting an email from me very shortly. Thank you for that, Ari. Lastly, where can people find out more about you and get a hold of your content?
Ari Kaplan 1:03:38
Sure. So from the content, I am a co-host of what's called ''More Intelligent Tomorrow'' podcast. So that's moreintelligent.ai. It's the website. It's tangentially related with DataRobot. It has like a lot of thought pieces, but you know, good podcast as well, to complement Leaders of Analytics. I also love connecting through LinkedIn. So like what Jonas was saying, if you want to follow along, especially world travels, and McLaren, I have all sorts of weird insights that pop into my brain. So if you want to follow along what my line of thinking is on LinkedIn. A-r-i is how you spell my first name and K-a-p-l-a-n, that's how you spell my last name. But that's a great way to follow me there or connect and message me. I love connecting with people too.
Jonas Christensen 1:04:28
Listeners do take up that opportunity because you won't regret it. Ari is a very fascinating person with lots going on. Ari Kaplan, thank you so much for being on Leaders of Analytics today. Really appreciate you sharing the knowledge with the world and our audience here and everything you've done for data analytics community over the years. There's much more than we've covered in the show, actually. We haven't even scratched the surface after you've contributed. And thank you for also making sports more interesting over the last 34 years. All the best and enjoy the rest of your fascinating career in sports analytics.
Ari Kaplan 1:05:04
Well thank you for having me. Thank you to the audience for listening and hope I can enjoy the wonderful cultural and food scene in Melbourne and see you in person again.