Jonas Christensen 2:54
John Hawkins, welcome to Leaders of Analytics.
John Hawkins 2:59
Thank you, Jonas. Great to see you.
Jonas Christensen 3:01
Yeah, I'm very excited to have you on the show. And listeners, John and I have known each other for a few years. And I know that John is guarantee for a very interesting episode today. And John, I won't steal your thunder, because we want to hear from you, not me. So let's just kick off and maybe you can tell us a little bit about yourself, your career background and what you do.
John Hawkins 3:23
Wonderful, Jonas. Yeah, thank you so much for having me here. It's great to talk to you. And as I said, we've always gotten along well, and had good conversation. So I'm really looking forward to this. My background is probably a little bit unusual for a lot of people in data science, although that's not what's usual. I started in computer engineering, but I actually dropped out because at the time, I didn't know that that's what I wanted. And I wanted to play in bands and just sort of, and do other things in life. And during that period, I ended up studying philosophy at the University of Newcastle. And I was lucky enough to study under a couple of professors who both came from physical science. So one of them would have been a quantum physicist, the other had been an engineer. And that got me really interested in - So you get this chance to sort of think about how science works and how it relates to the world and how we know the world, how it is that knowledge is possible. And then because later parts of those coursework were focused on ,say, like chaos theory and complex systems and artificial intelligence, it sort of laid the groundwork for me to go, ''Okay, yes, computer science is really what I want to do''. Because when you're a generalist like me, you're interested in lots of stuff. Computer science and what has become data science is a great way to sort of contribute and participate in lots of different kinds of projects. So that's kind of my background, which is why I say it's perhaps a bit unusual.
Jonas Christensen 4:40
Yeah. And you you're highlighting something interesting there, which is you've probably got the programme running in your head in terms of you've got a curious mind, but where it gets pointed in life, it's really a lot to do with the teacher you meet on your path. You can have good ones and bad ones that either attract you or detract you from some topic. So we think we're always rational human beings all the way through, but it's a lot of that stuff along the way.
John Hawkins 5:02
A lot of happy accidents.
Jonas Christensen 5:04
Absolutely. And as a happy accident, you ended up in data science. Perhaps tell us a little bit about how that happened in your career in data science to date.
John Hawkins 5:15
Yeah, 100%. So after that, I eventually went and focus purely on machine learning and AI. And I did a PhD in Applied AI, where I was building models to predict where proteins move around the cell. So that's essentially when proteins are produced in the cell. They have a thing called a ''Cut Targeting Peptide'', which is a little part of the protein that tells the cell how you have to move it over here, or you have to move it over there. And it's an indicator of the function of the protein. And luckily, for me, as a part of that process, I got to work with wetland biologist. So often we'd be building models. So they've sequenced thousands of proteins. We build models to say, ''Okay, we think these ones are likely to be mitochondrial proteins, or these ones are likely to be embedded in the cell membrane''. And then maybe go into experiments to sort of elucidate whether that was the case. So it gave me this experience, not only with building models, but also working with people for whom those models were critical to sort of discovering new knowledge or to sort of, at the very least, reducing the costs of running those wet lab experiments. So I know the reason I say that is because that's kind of one of these critical parts of doing data science. It's not just the analytical component. It's about how does that interface with something that people care about and give either real world results in terms of either knowledge discovery or sort of, efficiency. So when I spilled out of academia, I ended up like a lot of graduates at the time into AdTech. So predicting who's going to click on what. But I had a sort of a circular move through Commonwealth Bank and a bunch of other companies. But really, it was spending a bit of time in universities doing this machine learning work in different cases, and then coming out in industry and say, ''Okay, who needs stuff predicted?'' and advertising happened to be the most common place that that happened at the time.
Jonas Christensen 7:01
Yeah. And since then, you've done that in quite a few organisations. But today, you're the Chief Scientist at Playground XYZ. Right? So this is a company that maybe it's not that well known to a lot of people, but it's a reasonably sizable company, actually, I'd say. Could you tell us about the company, what it does, and the problems you solve for your customers there?
John Hawkins 7:24
Yeah. So, Playground XYZ was an Australian startup. We've actually been acquired by an American company called GumGum, who do many, many different interesting things in the advertising world. But playground started as what's called a ''Rich Media Company''. So the founders were interested in building what it is predominantly JavaScript frameworks for building interesting advertising. And the reason for doing this is because premium publishers need to differentiate themselves from the kind of the long tail of cheap advertising. If you advertise on the Sydney Morning Herald or news.com.au, they want to demonstrate that you're getting good value for money. Rich Media is one of the ways they do this. And then at some point in the company's evolution, they realised, ''Well, the reason we sell these rich media ads is really because we say that they get more attention. We probably need to start proving this''. So, that started them down the track of, ''How can we measure the attention on our ads and demonstrate that they get more attention than other ads?''. So that spawned this whole other arm of the company that I work for, where we build a stack for measuring the amount of attention on ads, and then doing other things, like optimising attention, helping companies by either designing their creatives better or buy better inventory to get the sort of the amount of attention on their ads, they need to get you what they're looking for, which usually sort of sales, conversion, signups, that kind of thing.
Jonas Christensen 8:45
Yeah, and I encourage listeners to go to Playgrounds website and actually look at some of these quite interactive content that's on there as examples of how advertising can be displayed online. So rather than sort of a flat image, it really interacts with you when you scroll, and when you're using a web page. So it is actually quite engaging. So it's interesting that you say that that came before the machine learning. It explains a lot to me. We'll dig into that as we go through the episode. Because John, we're not quite done learning about you, because you are the Chief Scientist at Playground. Could you tell us about that role? What is the Chief Scientist do? What are your goals? What are your responsibilities and sort of regular activities in a week?
John Hawkins 9:30
Yeah, so I don't know exactly how representative it would be of the same role in other organisations, but I can kind of give you a sense of why Playground felt they needed it. And there's kind of two arms to it. One was as they started building a part of the organisation, which was essentially machine learning lead, it was basically a bunch of different models for making these predictions about attention on advertising. Initially, they'd been doing a lot of work with external consultants, but they wanted someone who been spending a lot of time with machine learning, as they started building up that team and expanding out the product suite, to have oversight for all those machine learning models. And the second arm of that is more about the research component because one of, I guess, the difficulties of building a business on top of pure analytics in a sense, is that you need to be able to justify it. So in a sense, you need to be publishing research, writing papers, helping your clients design studies, so that they're convinced that your technology does what you say it's going to do. So that's the kind of the second arm of about it. It reuses some of the skills I got when I used to work as an academic of having to sort of write project proposals, outline a set of experiments, why we did them, sort of justify everything we do when we're building that entire machine learning stack. So responsibilities. Sorry, I didn't answer the whole question. Responsibilities are things like, - I mean, I'm still on the tools. Even though we've grown a lot, there's still a lot of work that needs to be done. So I still do machine learning, actively building models, but it's also about hiring people. So I spent a lot of time sort of interviewing people, mentoring people, making sure projects are on track, and then more and more, as well as sort of writing these papers and research pieces. There's also an element of something that's quite new to me, writing North Star papers. So because, I'm sure you're familiar with, the larger an organisation gets, it's really hard to make sure everyone's aligned on how all the little subtle technical decisions might influence other parts of the organisation. And the potential impacts on the quality of our attention models that small engineering decisions might make is something that we need to keep an eye on. So we tried to solve that, rather than through sort of an autocracy, more through sort of these North Star papers, where people have a sense of, ''Oh, this is how the work that I do influences the greater goal''.
Jonas Christensen 11:53
Yeah, interesting. And I think the complexity of system interaction is really amplified when we're talking about machine learning, because it's a moving object all the time and it's really the data that is a code in a sense. So I can imagine it being even more difficult when you have a product or platform like yours that is highly machine learning driven. Now, John, let's learn a bit about what you sell, what your solution is at Playground, because you've sort of alluded to a little bit, there's this interactive creative that you create, or you can create using your platform. But you also have this attention measurement platform. So could you tell us how that works in practice? What's under the hood of it? What can use us get from it?
John Hawkins 12:41
Yeah. So I can tell a little story about it too, because I think the origin is somewhat interesting. Before I joined when they were first thinking about, ''We should be measuring attention on ads, because that would help us build a case for the quality of our rich media''. Actually, if you don't mind, there's not too much I can say about the rich media component, because I'm not a creative technologist, myself. I think the best explanation is for the listeners to, say, go to any sort of major Australian publisher website and you'll encounter these kinds of ads that are not like the standard ads. They might say, ''if you hover over them, they expand'' or there's some kind of interesting animation, or they might occupy the sides of the screen if you're on a desktop, for example. Those are what we typically mean by rich media. And it's an ongoing race to build better ways of building those ads. But on the attention measurement front, they initially were thinking about getting headsets. So we'd have people come into the office, put on a headset and measure attention that way. That's how it's traditionally done in a lot of academic studies. The problem, of course with that is that it's both expensive to run and typically the sample sizes will be small. But also you're asking people to consume media and coming to conclusions in an somewhat artificial environment. And luckily for us, at around that time, there was a ongoing massive explosion of different kinds of computer vision models, including some really nice research papers showing that you could build convolutional neural network based models for predicting some sort of fixation position on the screen, just from photos of people's faces as though we're using a mobile application. So the team was they're able to sort of reuse that open source, those papers, build their own version and their own application. And it was originally an iOS app that could run these panels where people basically, - There's a bunch of these digital worker sites like Clickworker, and others, maybe AWS. People would be familiar with Mechanical Turk, where people get remunerated for doing some kind of digital task. In our case, it's download our app onto your iOS, give permission for us to track your eyes for 20 minutes, half an hour, and then read these newspapers from our list. And then we'll sort of pay you and then you delete the app and turn off the eye tracking. So, it's Basically consent driven, remunerated eye tracking media studies, but done on people's own devices in their own homes in a more natural environment. And we can do it at scale. So we can instantly turn it on, and measure how much attention ads get in different countries on different websites, etc. So, the scale advantage is massive. So that part of our stack has continued to expand. So we've got other eye tracking technologies on different platforms. But one of the other key insights that Graham Burton, who's one like our Head of Data and Product, had realised was that at the same time as you're collecting these eye tracking data, you could also collect all these other sorts of behavioural points about how people are interacting with a page that will then allow you to potentially predict the amount of attention an ad got later on in the wild, when you don't have the eye tracking turned on. So this is where these, as I spoke to you previously about these, multiple stages in machine learning, we use the survey panels to collect data about real life tracking. And we do that at scale in as many different environments as we can. And then we collect other behavioural and contextual information about how the user interacts with the webpage that allows us to then predict the amount of attention that happens on real ads in the wild. And that's effectively our measurement platform. So using our tag, clients can attach that tag to their own ads, when they're running inventory everywhere, get an indication of how much attention their ads are getting and then we can start to optimise it. We can sort of choose to tune which particular media we purchase to optimise attention, or even choose which creatives are working better and optimise that way. So, that's the final stack. You build the dataset to understand how attention works. You build a model that allows you to do it at scale, and then you get value from it by sort of improving your decision making.
Jonas Christensen 16:51
Yeah, it's pretty clever, really. And if I paraphrase, so I make sure that I understand and hopefully then that also helps listeners, you collect on this group of opt-in subjects from Mechanical Turk or whatever.
John Hawkins 17:05
Yep.
Jonas Christensen 17:06
You collect eye tracking, but also 39 other variables. And then when you are in the wild, you can collect those 39 other variables or close to that sort of number, and you take out the eye tracking, but all the other confounding factors will, will interact in a way that you can say that when people are doing this, they're probably looking over here with some level of statistical significance and probability. And therefore you can infer what kind of eye contact they're having with the screen. Is that correct?
John Hawkins 17:37
100%. Exactly.
Jonas Christensen 17:39
So pretty clever. And it really shows, I think, a very practical use of machine learning in the day-to-day business world. It's not just about talking robots and all this stuff. This is actually a very practical optimization exercise that nevertheless has that scientific approach behind it, but also a fair bit of grunt work from you and the team. So you use this then to also optimise your actual creative. So I assume you can tinker with the things that are displayed on an ad to measure the difference and so on. Is that right?
John Hawkins 18:14
Yes. So, what we call our activations team, which is the people who do creative execution and would really work at that front edge of building new rich media formats. They can use the attention measurement. That's one of the things they built themselves, their own tool to look at points in, say, a video ad where it's getting more attention or not. So if you look across sort of hundreds or thousands of impressions of that video ad and look at where the main attention to each frame is going up or down, you have a sense print, potentially, of what it is that's drawing viewers back to look at the ad when they've maybe been looking away at some other content. And that could be - There's many ways to analyse that. So, they're always coming up with ways to use our attention data to think about and analyse creative execution, which is the advertising term for the content of the ad.
Jonas Christensen 19:08
Yep. So John, million dollar question.
John Hawkins 19:12
Yeah.
Jonas Christensen 19:13
Why should we care about attention? Why is that such an important metric compared to lots of other metrics that we used? Click through rates, all that sort of thing?
John Hawkins 19:23
Yeah. So, there's a couple of things to think about. So, advertising is broadly broken into what we call: Performance and Brand Based Advertising. So in Performance Advertising, you're typically looking to get a direct impact, which could be a conversion, or a sign up to your newsletter, etc. And that's where clickthrough rates really, really matter because you're trying, you know, - there's a sort of direct course or pathway from the ad to the thing you're trying to achieve. And so a lot of ad technology has been focused on optimising that performance funnel. And even though we have seen an ability to have an impact on that, that's less of our concern. We're more focused on the brand advertising and brand advertising is more subtle because particularly for brands that are for large scale sort of decisions that people make, like, say, purchasing a new car, or which bank to get a mortgage from, but even sort of, you know, irregular ones, like say, maybe you're purchasing a new computer that are more common, but still somewhat irregular than sort of standard purchases, there's a much more importance in getting what is often called mental availability. So, another kind of sidetrack here, they say the world of marketing science has had many sort of fads over the years, but we at Playground, we've been heavily influenced by the Ehrenberg-Bass Institute, which is an Australian institute for studying market science. And they are quite famous for sort of upsetting the applecart of marketing science somewhat. Frequently saying, a lot of the things that have been said in this discipline are essentially rubbish. And really, brand advertising is in some sense fairly simple. You're basically trying to make sure that you're building what they call mental availability, so that people are aware of your brand and that your product is available. So physical availability of the product, so that when it comes time for them to purchase it, there is a good chance that your brand will be in their mind, and they'll choose to purchase it. So, they're famous for debunking notions like brand loyalty, and all these kinds of things, saying that it's mostly nonsense, if not completely nonsense, and that really, it's about just building that mental availability. So from that kind of launching pad, our thinking is that there's also a bunch of research that's looking at how really to get mental availability. People have to be paying attention to ads. There's a kind of a pathway from someone either looking at or listening to an ad, cognitively processing it to build that mental model of. This is not just what the brand is, but what sector it fits in. So that that brand can be called to mind when they want to make that purchase. So there's a kind of fundamental marketing reason why attention matters, because to get to mental availability, you have to pay attention to it. Now, there's a lot of different opinions, as you might imagine about what attention means and how much attention you need and how does that work. Our focus is on direct visual attention. Because we think that being able to measure where someone's pupils are fixated is the strongest possible indication that they're mentally paying attention to that thing. It's much harder - There are people who do these different kinds of studies for working out when someone's thinking about something. Direct visual attention is much easier to measure. And because the vast majority of advertising is visual, it gives us a really, really strong signal to whether or not someone is actually likely to remember that brand. And that comes out because, as you might imagine, building and releasing analytics products, we have to do a lot of work with clients to convince them that it's not just sort of smoke and mirrors. Yeah, so they need to believe that it actually works. So two pathways: We often run brands survey results, where after running advertising, we'll survey a certain amount of participants to get a sense how their opinion about a brand, or even just their ability to remember the brand has changed. But also, we encourage clients to look at things like: Can they get a sense of whether conversions or some other kind of metric they look at has been impacted by looking at campaigns that get higher attention than others? Does that make sense?
Jonas Christensen 23:40
It does. It does to me. And I'm sitting here thinking, John, that you're actually doing the marketing industry a huge favour. This sounds very obvious, because you've come up with something that's useful there. But I think in the last 25 years of digitization where advertising has in large part moved online, and where we therefore own a digital format, and we therefore can measure clickthrough rates. We can measure clicks on pixels, or ads, or images, or whatever it is, and also website interactions and so on. It has become more interesting to play with all that data and see what happens and actually track people through, as opposed to the bit of marketing that we can't really measure that well, which is ''Let's spend $10 million on an ad campaign that's on TV and billboards and other places, and just see what happens''. And we don't really know and we say that we have these sort of further away, almost pseudo-vanity metrics of brand awareness and consideration that's sort of our vague measures for did that campaign actually work?
John Hawkins 24:53
Yeah.
Jonas Christensen 24:54
But you're bringing some of that element of analysis and analytics and data points into the actual advertising itself. That means that level at an analytical level, it can compete with the stuff that has clicks, and all that stuff and links sitting underneath that you can measure. And I've seen it many times in board rooms where marketing executives are trying to sell to the CEO of finance that we should do this campaign that's really important for our brand. But it's a bit vague for people who are very concrete and numbers driven. And you can't tell me what my ROI on that campaign will be exactly or what's the cost per acquisition of each customer, when I put a TV ad on its. That seems very expensive, that I got to spend 10 million on this. So you're sort of bringing that in to play here, I think. Quite interesting. And for someone like me, who's analytically driven, it makes brand advertising a whole lot more interesting all of a sudden. So that's my observation. And it did make sense, to answer your question. And let's dig into it a little bit more. Because I'm interested in this in a world too, where we have cookies going away and where we also have sort of privacy and consent being increasingly mandated, and increasingly important in some organisations like Apple, for instance, as sort of being proactive about protecting consumers from being tracked around the internet, basically. What you're putting out in the world here, your solution, how critical is that around this change? How important is it for advertisers and how are they responding to these challenges today?
John Hawkins 26:37
So, it's a great question. And it's an interesting thing that in the industry, it's called ''The Death of The Cookie'': the fact that cookies and tracking people is going to go away. Various vendors have been sort of postponing it more and more, because they recognise that a large amount of the digital advertising economy has been built up on top of the cookie. For those of you who are not familiar with it, it's this idea that you can place a small bit of code or a text file on a person's computer when they browse a website that allows you to know things about them when they come back. But actually, there's various tricky ways people have come up with for using that to track people across the internet. So, say, if you're a large media conglomerate and you run many, many different websites, you can sort of collate that data and have a sense of how the same person navigates around the internet. And so the reason that's important is because large amounts of the analytics used for targeting people has been built up on following them around the internet and saying, ''Okay, this person is interested in, what could have been, mixed martial arts and sports cars. We know that we can target them with, say, ASX ads, for example. Perhaps it's like a good fit there''. So they'll build up profiles of people based on this behaviour data. And that's built up this entire technology suite of how you target people and get effective advertising. So the problem is, of course, when you take away all of the tracking, their whole house of cards collapsed, because the entire way you think about who to market to is built on top of following people around. So our way of thinking about this is that ultimately, this gives us a new way of building a new framework for advertising, where you're going to have to, when you run an ad, you'll be able to go through an initial measurement phase, where you get a sense of where is your ad getting more attention, when you put it in the car sites or in the fashion sites or in the sports sites. Where is it getting more attention? Which of those formats is attracting more attention? And that allows you to fine tune where your media spin goes. And so we've spent a lot of time doing sort of research on this. And one of the things that's interesting about it is we sort of expected, - I guess, a kind of general assumption in the advertising industry had been that the contexts in which certain brands do well should be kind of obvious. That if you're Nike, you advertise in sport. And if you're EstƩe Lauder, you advertise in beauty and fashion, for example. But one of the things that we see is that actually, that's not the case. Probably most of the time that actually there are subtleties to that, that certain ads will do better in different contexts. Maybe not entirely surprising contexts but you could say there's maybe a suite of contexts and where your creative is performing best will change over time. And one of the things as we've kind of been digging into some of the research around this, one of the things is there's various kinds of trade offs, because you have to remember that your advertising exists in an ecosystem of all the other competitors ads, who are all appearing on the same kind of media. So one of the things that often can be a driver of getting attention is simply if your ads unexpected. So if Nike tries to add on a sports site, and people have already seen three sneaker ads, it's not going to drive that much attention, whereas EstƩe Lauder suddenly pops up there in that site. It's unexpected and maybe it'll get more attention. And then of course, there has to be some kind of synergy between what that brand is and what that site is, in some sense to make sense. But, it's not always very intuitive what that's going to be. So, we found that you can run large scale experiments, just to try and understand that relationship between the brand, the execution of the creative and the context. I'm not sure if I answered the question, actually.
Jonas Christensen 30:22
I liked the content nevertheless. It was interesting, John, and I think it highlights something subtle here, because what you're describing to a large extent is the outcome of experimentation. You mentioned experimentation itself. But the importance of that, because as human beings, we tend to start with logical reasoning. Nike should go to a sports site where they sell shoes, because people are looking for shoes. So wouldn't it be most likely that if you go on Runner's World website, that people they're going to be really interested in that ad? Guess what? Those Runnners World's people are also looking at other websites and you can get them there, maybe. But you only find out which ones actually do the trick when you experiment. So it's the thing that really businesses really need to think long and hard about how do you create this experimental culture in your organisation? And, John, I imagine that that's something that you not use personally, maybe but your organisation, your team also have to really sort of try and coach your clients and your customers to actually be comfortable experimenting a bit. Is that something that you actively do as an organisation? To sort of convince clients to step a little bit outside there? Yeah, you can almost call it a comfort zone. Their logical line in the sand for where they can advertise?
John Hawkins 31:49
Yeah, absolutely. So we're definitely engaged in that. But as you might imagine, it's not easy. In part because we're not always seen as an impartial third party because experimentation for us would be purchasing fairly wide amounts of inventory, which is, of course, good for us. So it's a slow process. We have to sort of encourage experimentation within boundaries the client is familiar with it, they're comfortable with, so that they can see that the optimization process actually works for them, and they can gradually get more and more comfortable with sort of testing broader as they begin.
Jonas Christensen 32:27
Yeah. So, small experiments, but you probably make the money back in the long run by knowing this information.
John Hawkins 32:34
Yeah.
Jonas Christensen 32:35
So, John, one of the things that you alluded to earlier was that you actually have a machine learning pipeline sitting underneath all this?
John Hawkins 32:46
Yep.
Jonas Christensen 32:47
That is running multiple models that are sort of a chain of each other out there. They're depending on each other. So one models output is another models input. You're doing eye tracking first and you're using that as a variable in subsequent machine learning models. What kind of additional challenges does this cause for you as an organisation and for you as an individual and what you do and how do you deal with these challenges?
John Hawkins 33:13
Yeah, it's really hard. Like, I think it would be fair to say that there's a lot of data scientists I've known over the years who are really uncomfortable with these kinds of situations and the main reason being that they feel comfortable with talking about how the error in one particular model has an impact on the business and they can kind of quantify that. But as soon as you start having paid potential for one models error feeding into another model to another model, it becomes very, very difficult to sort of track and feel comfortable with. So I think it's certainly challenging. But I think there's a couple of strategies around it. So the ideal strategy: You have labelled data for the full end-to-end. So, say, for example, the model here, the first model is trained to predict X from some set of inputs and then that X is now an input to predict Y in the second model but you've got labelled data for that. It sets you down the chain. But you may also have a dataset you've labelled that has the full input, and then what the final output is going to be. So that at least allows you to - Well, of course, it raises the question, ''Why don't you predict that to begin with?''. I mean, there can be other reasons surrounding this cost of labelling that data, for example. The challenging one for us is that, in some sense, we are relying on - The process by which we validate the eye tracking is that we have to ask the panellists to fixate on certain regions of the screen for a certain period of time and we have ways of testing as to whether they've done that. Partly statistical ways. But we can see that that allows us to see if the eye tracking is either just slightly off and we can recalibrate it or if it's just kind of random, so there's something about their lighting conditions or their device, which means that it's not working, so we can throw that data away. So we go through this process of understanding and cleansing and calibrating that data. And then one of the things that gives us is an understanding of the error distribution for the eye tracking. So we're able to use simulation as a way of understanding it, because for us, we don't have that label data of, ''We know that this person looked at this ad for exactly 3 seconds, 3.2 seconds'', for example. There's no way to sort of label that data, we can sort of try and generate it ourselves. And we will do that. But then we could also look, given that we have strong understandings of the error distribution on the eye tracking direct component, we can simulate the expected error on the gaze duration. Like, the amount of attention you get on the ad. So we can quantify the expected error through simulation. That obviously is a lot harder to do. And as I'm sure you're also aware of, it's hard enough getting organisations to trust machine learning models, let alone get them to trust simulations. But I think we will get there. Because ultimately, simulation is a great way to - Like, you know, as long as you're upfront with the assumptions you've made, it's a great way to understand and test the limitations of a system.
Jonas Christensen 36:19
Yeah, it sounds like if nothing else, you've been very, very thorough about the approach. And there is no easy way to tackle those challenges. So fairness is kind of the way to do that. John, the other thing that comes to mind for me here is that what you're doing is actually creating a product out of machine learning models. There is an interface for this, a software tool, that then underneath has all this stuff running. And if we step back a little bit from your organisation to the broader data science community, there's this increasing need for data scientists to play a really active role in user experience design and product design, which really is also a very large part of your remit, I imagine by default, because you're building something that will be interacted with by users. So what is your advice to listeners here on how to foster the right non data science skills that will make these data scientists true unicorns in that sense of being able to piece together a whole process end-to-end?
John Hawkins 37:38
Yeah, that's a great question. I think the first thing to always keep in mind is to maintain humility. I think striving to be a unicorn is, of course, a great goal. But you should also remember that the title is not intended to mean that you're an expert in everything.
Jonas Christensen 37:57
I use that term, because that is what all the job ads are looking for. They don't use the word, but it's pretty much it.
John Hawkins 38:03
Yeah.
Jonas Christensen 38:04
Yes, so the take the term loosely.
John Hawkins 38:06
So, I think humility. Just always recognising that no matter how fast you've learnt these things, you won't be an expert in them. So it's always good to have people around to consult with. But I would say actually, the core, I think, perhaps the underemphasized core skill of a data scientist should be the art of both asking questions in a dialogue, and then building a mental model of the system that you're trying to work with. So, it's not like you're going to build a perfect mental model. Because, you haven't had that deep industry experience. But there is a kind of art to being able to mentally sketch a schematic, so you understand where all the crucial kind of weight bearing joists are, for example. So I think that applies more broadly. I think you can be effective in many different problems in many, many different domains. If you actively cultivate that skill of asking questions, thinking about what's been said to you, don't be afraid to say you don't understand if you don't understand and also ask questions back for clarification and try in a sense, like, as you ask the question and hear the response, you build a mental modeL. If something doesn't gel with what you've seen before, or your understanding of other things, ask questions to get more clarifying information. And I think that applies to getting good, or getting the fundamentals in any new skill. Asking questions to people who know how to do it well is a really good way to get there.
Jonas Christensen 39:36
Yeah, and I asked you this question, John, because when I look at your career, you have started out by doing very complex things in different industries. Right? So, you talk to us about the protein simulation. Again, I assume that you're not a biologist, or were an expert in how proteins interact inside the body and there are probably very few people who really are experts like that. So it's a hugely, hugely, hugely complex topic in itself. But being able to create simulations the same, it really requires that cross-functional collaboration, as we call it in the corporate world. That we're trying to connect different skills and your ability to step into someone else's shoes there and try and think and understand how they think and think like them is really critical. And you've then done that in other industries. And when when we met, you were also in somewhat of a consulting role, where you would probably doing that in lots of different domains, industries, businesses, and so on and now in this role. I think for listeners, John is not saying that, without having lots of experience in applying the same thing, and I couldn't agree with you more. It's that sort of natural curiosity and wanting to step into the problem, and really think about the problem for many angles that really makes someone more unicorny than the person who doesn't do that. Now, John, we've now started talking a little bit about product design around machine learning models. And so why don't we unpack that a little bit more, because Playground's products are actually the software solutions that are built on top of machine learning models, as we just agreed. And with that comes probably some very unique challenges, because this is a very different business, as opposed to other businesses in this world. You know, I think there's almost a machine learning ecosystem, where there's different business models that exist. Right? There is more generic platforms. There's software solutions you can get, you can purchase, where they sort of produce generic models and you can customise that to your needs. You can get consulting services and so on. You're actually creating this somewhat black box and then selling that to someone else and saying, ''This is really good, with our stamp of approval on it''. You've already alluded to some of the challenges around convincing people that it is not just smoke and mirrors. But what are the challenges are unique to this sort of business?
John Hawkins 42:08
So there's many different challenges. I mean, one of the challenges that we'll face and in fact, anyone who's familiar with us - Some of our competitors, one of the things they'd immediately latch on to is the fact that we've historically done rich media as well as measure it. So, if you're not seen as independent, that's an immediate avenue that people can attack you from. So you have to make a massive effort culturally inside the organisation to split those parts of the organisation. And you see this in the big players. Like, Google has been very famous for this, for making sure that the advertising component being separate from the search component, so that there was no bleed over, there. All of the big tech companies who faced any of these kinds of potential criticism have done it. And it's partly, you want to partly create that so that you can fend off any criticisms from potential competitors. But also, because you want that - You can't build great products unless your own people believe in it. And so, your own people have to see, '' This is what we're doing and we're committed to doing this thing well and we're not going to allow the way we go about doing this being swayed by other business interests''. So I think it's the getting and building that internal part is going to be a challenge, because in these analytics products, you're building something that is in some sense, I don't know if it's exactly larger than the sum of its parts, but it's a hard thing for everyone in the organisation to completely grasp how the whole thing works in a sense, and really, the process behind it. So constant clear communication about what we're doing, why we're running experiments, decisions we've made that may not seem optimal but they're the best of the the options in front of us and how that's taking us where we want to go. So constant communication about that kind of thing, helps to build that internal culture so everyone can say, ''Yes, things up are not always perfect but we're constantly heading towards, in our case, better measurement system''.
Jonas Christensen 44:16
Hi there, dear listener. I just want to quickly let you know that I have recently published a book with six other authors, called ''Demystifying AI For The Enterprise: A Playbook For Digital Transformation''. If you'd like to learn more about the book, then head over to www.leadersofanalytics.com/ai. Now back to the show.
John Hawkins 44:39
And then the second set of challenges, the more external ones, because you'll find and this is something our sales team encounters on a daily basis: What is the knowledge level of the people you're dealing with? What's their expectations and/or opinions and/or biases about these kinds of things? And that can be all over the shop. So you have to have a suite of strategies that will help convince someone that your analytics product is actually going to fulfil their needs. So for some people, there's a couple of extremes. For some people, it's going to have to be an independent test where they're in control of the testing outcome. So they can see that your black box, as you say, actually does what you say it's going to do. And ultimately, that's always the best to some extent, because the alternative is starting to open the black box and any analytics company is not going to completely open it, because there's too much proprietary stuff in there. So the next part is: How much do you open the box? You only want to do that - The amount you do that depends on who you're dealing with on the client side. If they're really sophisticated, then you can sort of open it quite a long way, as you'll have some sort of baseline of things we know we can't reveal, because they're too proprietary to us. But you want to be able to reveal enough that demonstrates to the most sophisticated clients, ''Oh, these guys are actually running the right experiments. They're actually validating everything and every decision in this process is well thought out, and I can trust them''. So that's kind of the two broad personas, you might deal with. People who need to see something in a box to understand that it actually works vs people who don't care what's in the box. They just want to see the sausage that comes out the other end.
Jonas Christensen 46:25
I think I've met both types and personalities in my career. So that's clear. John, I've got a couple of questions left for you. Now, this one is a little bit of a big one, the next one, because you're a ''machine learning as a product'' business. But we also have typically the situation where we have these very data heavy businesses in traditional industries. So, banks, insurance, telecoms, utilities, manufacturing, health care, professional services, stuff that was around before the internet basically, but nevertheless is digitised and very data heavy. They use data with varying success to try and optimise their existing operations. But we don't really see very often the creation of whole new business models or products in these industries. Because they're so data heavy and because the industries I mentioned there, they're so omnipresent in our lives, that the potential has to be there. What do you think is required for these businesses to actually break that seal and be truly innovative with machine learning?
John Hawkins 47:36
It's a difficult one. In fact, you could almost summarise the modern history of, I guess, corporate capitalism, as innovative young companies get big and bigger and bigger and stagnant until someone comes along and takes their market from underneath them. I mean, that would be - if you had to sort of do a crayon sketch of how capitalism functions at a corporate level, that's it. There's a bunch of interesting anecdotes of leaders who've been aware of that and have taken actions to try and mitigate it. So I think there's an IBM story about trying to start a separate division around desktop machines. I forget most of the details at the top of my head. But I think the core idea is recognising that the larger and complicated an organisation is, it's hard to be agile the way startups are going to be. So, having the ability to either fund those startups completely externally or have them somewhat segregated from your core business, to build technologies that you can partially own, that are a potential threat to your organisation, seems to be probably the most tried and true. Anyone in, I guess, innovation and strategy in big corporate, I'm not telling them anything they don't know when I say that but I think it's a lesson that many companies seem to not know. So it's something that needs to be said more often. I think the other thing that you see sometimes is that there are organisations that are able to do things orthogonal to their main business in a way that can be interesting and innovative. So there's an example from Germany, I believe. This is not a perfect example for what you're asking because it was an online site that was doing, I think, auto sales. But someone inside that organisation realised that the data that they collected about the sales of cars and their various details would actually be invaluable to insurance companies who are trying to value cars based on a bunch of details about them. Their age, and mileage, etc, that kind of thing. So they started a whole sort of separate branch of the organisation, building reporting analytics for insurance companies, that was effectively a new cash flow that was built off their existing datasets. Now, that's pretty hard for most large corporates, mostly because of the kind of privacy concerns and it's hard for banks or insurance companies to do that. But there have been examples where other big corporates have done that. So I think it was Singtel in Singapore, managed to create an additional sort of startup that used their aggregated cellular data to provide mapping information about movement of people on, say, roads and the subway system, etc. So they could use that for potentially urban planning and I'm not sure who the other downstream customers were. But in that case, it's large scale data that isn't particularly privacy sensitive, at least in an aggregated level that has other utility beyond that.
Jonas Christensen 50:29
Yeah and you're highlighting something which potentially is a flaw in my question. That when you actually dig under the hood, there is more there than meets the eye, in terms of productizing data. It's not necessarily as consumer friendly, if I call it that. It's your product, it's not neatly tied up with a bow on it in a software solution. But nevertheless, it's using that data for a secondary purpose that can be quite valuable. And I know that a lot of banks are doing this in the last 10 to 15 years. That's been a growing value proposition for corporate and institutional banking to kind of help with benchmarking and comparison of your business vs nondescript competitors. So let's say that McDonald's wants to know where they can put their next restaurant. Well, where do we find customers that are like the ones that are around where the really successful ones are and how many competitors do we have in an area and so on. Right? So, that sort of geographical analysis can be really powerful when you can combine the revenue data from the corporate and institutional arm of a bank with the bend data from the Consumer Bank. So this is for organisations that are large enough to be a representation of the market as a whole. So, stuff like that is going on. But we're not, - Again, I have to be careful what I say, because I'm sure someone will find a great example of the opposite being true. But we're not seeing lots of AI solutions that are sort of a business to consumer in those industries, necessarily. Although there are bits and pieces under the hood that we don't see. John, we are, towards the end, I have two questions for you before we finish up, the first one that I always ask of guests on the show is to pay it forward. I'm gonna ask you who you would like to see as the next guest on Leaders of Analytics and why?
John Hawkins 52:31
Yeah, absolutely. So I would say Luiz Pizzato. I should have checked every single one of your uploads to make sure that he hadn't already appeared on here. I don't think he has though. Yeah, so he is a data scientist who leads team mentors of a bunch of people inside the Commonwealth Bank. He's done many, many different things in his career. A great public speaker. And he's got a lot of interesting anecdotes about real kind of human centric problems that banks helping to solve. So I think he could be a good segue, actually, from your last question, as to interesting innovative things a bank can do with their data that are good for consumers.
Jonas Christensen 53:10
Yeah. And if there were one organisation to mention that actually is doing something, it would be Commonwealth Bank. They are in Australia definitely the leaders in using AI and machine learning for customer centricity. So, great suggestion, John and I will check out Luiz's profile and get in touch with him. Lastly, John, where can people find out more about you, get a hold of your content and connect with you?
John Hawkins 53:33
Yeah, so a couple of things. I mean, connecting with me on LinkedIn is always great. I'm always having a chat with people about things there. And you know, as I'm interviewing and hiring, it's a great place to connect with me. You can go to ''Getting Data Science Done'', which is a website for promoting my book, which is about, I guess, some of the themes that we've spoken about here today. Predominantly about those kinds of things about the art of asking questions, and drilling into problems and the kinds of things that are, I guess, things that can go wrong with data science projects that are not necessarily about the technology.
Jonas Christensen 54:10
Yeah. So listeners, go and connect with John on LinkedIn, and do check out his book ''Getting Data Science Done''. It's one of those books that you can judge by its cover, because it does what it says inside.
John Hawkins 54:23
Thank you.
Jonas Christensen 54:24
Do you want to learn how to get data science done, then that's one to check out. John Hawkins, thank you so much for being on Leaders of Analytics today. It's been a real pleasure to learn more about you, more about ad tech and just more about how we can get more out of data science in today's world of business.
John Hawkins 54:41
Awesome. Thank you, Jonas. It's been great.
Jonas Christensen 54:43
Hi, dear listener, just a quick note from me before you go. If you enjoyed the show, then please don't forget to subscribe to future episodes via your favourite podcast app. I have loads more great stuff coming your way. Also, I'd love some feedback from you On the show, so please please leave a review on Apple podcasts, iTunes, Spotify or wherever you listen to podcasts. Thanks for listening and catch you soon.