Jonas Christensen 2:56
Bill Inmon, welcome to Leaders of Analytics. It is fantastic to have you on the show.
Bill Inmon 3:05
Thank you, Jonas. It's a pleasure to be here.
Jonas Christensen 3:08
This is a very special moment in my life, Bill, because I have known about you and about your name for pretty much the whole time I have worked in data and data science, which is probably 16 - 17 years by now. And here we are today, having this conversation. We'll get into why your name is so well known out there, as we go through. The first thing I think we should do is for you to tell us a little bit about yourself, your career background and what you do.
Bill Inmon 3:36
Surely. I live in Denver, Colorado. I have a wife and a dog. My dog is a very demanding dog. I have a company, Forest Rim Technology. Forest Rim Technology takes text and turns text into a database. Ironically, the people that work for Forest Rim are scattered throughout the world. We have people in Mexico. We have people in India. We have people all over the United States that are part of Forest Rim. So that's a little bit about me.
Jonas Christensen 4:06
Yeah. Wow. And it sounds like Forest Rim is not a small operation. We'll hear a lot more about Forest Rim Technology as we go through. Bill, I think you're being a bit modest there. You have worked in this field of data for quite a long time. And could you tell us a bit about how you got into this field and how your careers played out all those years?
Bill Inmon 4:24
I'll make a long story short. When I was in school, I took a course on computer science and I learned how to programme. Now this was a long time ago today. Today, they have all kinds of courses. Back then, this was the only course in college. When I graduated from college, I tried the professional golf circuit. And I found out that even though I was a good golfer, I was not a great golfer. And so, I was starving to death. I happen to be in a place called Shreveport, Louisiana and there was an ad in the paper about wanting programmers. And so, I went and applied for the job and got it. And so, that's where I learned. First started to learn about computers, database design, database administration, and things like that. And then from being a failure at professional golf, I hope I've done better in the world of computers.
Jonas Christensen 5:19
I think you certainly have. The name ''Bill Inmon'' is a synonym with something that you hear often, which is: The Father of Data Warehousing. Which is what you're often described as. Could you tell us where that label comes from?
Bill Inmon 5:33
Well, a long time ago, I had a business partner, who's now deceased, named Arnie Barnett, and Arnie and I started the first conference on data warehousing. And so, that's where it came from. Arnie started calling me that and it's stuck with me ever since.
Jonas Christensen 5:50
Yeah. And in that time, Bill, you graduated from college in 1967, from Yale. I think that - Is that correct?
Bill Inmon 5:57
That's correct.
Jonas Christensen 5:58
Other than a bit of golf, since then you've pretty much been in this space. You've authored a number of books. How many specifically on this topic?
Bill Inmon 6:06
Well, this is kind of embarrassing. I've actually written 63 books on data warehouse, and other technologies. I've written two books that are fictional. I come from a family of writers. My sister has written 28 books. My father wrote 10 books and one of our family forefathers years ago, was a gentleman named Edgar Allan Poe. And my niece was a screenwriter in Hollywood. So we are all a bunch of writers.
Jonas Christensen 6:36
Wow. That's rare that you can almost fill a library, just from one family. That's very impressive. So Bill, we can sort of start to see here that you are really a pioneer in this field of data warehousing and I think we're gonna get more into the pioneering that you're doing at the moment with textual analytics later on. But given your background in in data warehousing and sort of being the father of it, I'm really interested in your take on the, may I call it, the history of data warehousing. So sort of where we've gone from the very nascent beginnings to how it's evolved and where we are now. What are the things that were important throughout that journey?
Bill Inmon 7:13
Certainly,
Jonas Christensen 7:14
For businesses and how we actually then technologically solve those things.
Bill Inmon 7:18
Years ago, the only style of processing that people paid any attention to was transaction based processing. Processing for ATMs, bank tellers, reservation systems, and the like. And I worked for several organisations that did that kind of processing. And it's then I started to realise that there's different kinds of data that's out there. There's more to life than just transaction processing. Now, when I started to tell people that, they didn't like me at all for saying that. I have a whole file of people telling me what a horrible person I was. I have one letter that says I should never be allowed to speak in public. That I was setting our industry back 25 years by mentioning the fact that there's more to computing than just transaction processing. And so with the recognition that there's different kinds of data that's out there, that really led to the concept of data warehouse. Now when data warehouse first came out, data warehouse was not supported by the IT organisation at all. The people in IT, for whatever reason, were turned off by data warehousing. So instead, we and the people that were the early pioneers started talking to business people. We started talking to marketing people, finance people, salespeople, and the business people were very interested in data warehousing. In fact, data warehousing really got its start, the very first data warehouse that ever existed was at a company in California, called PacTel Cellular. And back in the day and age, the cellular companies were in a battle for market share. As many customers as they could bring in. And PacTel Cellular - God loved them - found out that they could use a data warehouse to attract and keep new customers. And so, what happened was the other cellular companies came along and said ''Oh, my gosh, PacTel got a data warehouse. We have to have one too''. And so management went down to the IT department and said ''You're going to build a data warehouse''. And of course, the people in IT had no idea what a data warehouse was, but if it hadn't been for PacTell Cellular, somebody else would have come along. But God bless PacTell Cellular. I still look fondly upon working with them because they were like the match that set the fuse off. And then once the cellular companies figured out they could use data warehouse, then retailers. Walmart came along, Kmart, Target and the retailers came along. And then the next thing you know was, everybody discovered that they needed a data warehouse. That's a little bit about how data warehousing came into this world. Today, you find that the IT people that get along with data warehouse. I wouldn't say that they understand it or accept it very well. They at least co-exist with data warehouse. But it's still the marketing people, sales and finance people that are aware. You find the strong advocacy for data warehousing.
Jonas Christensen 10:33
Yeah and data warehousing is definitely today, not what it was, even 10 years ago. I still remember, maybe 15 years ago, that you could bump into reasonably large organisations that still had the data warehouse, where that was maybe sitting on someone's server under their desk. That's not an uncommon scenario back then. But today, we have all the cloud providers really taking data warehousing very seriously and building massive infrastructure for that sake. And also the type of data that we have, that we're trying to structure is very different. Bill, what are the things that you're most excited about in this field today?
Bill Inmon 11:12
Well, when I look at the future, and I have a different viewpoint than most people have. Then again, you ask me the question, so I'll give you an answer. When I look at the world in front of us, it's like the state of California in 1848. We're told by people, the historians that in California in 1848, you could walk down to the stream and pick up gold. It was that easy and you can become rich by picking up the scope. Well, sure enough, gold was discovered in California in 1949. And then you had the land rush across the United States, of everybody wanting to get to California to go find the gold. Well, I think that's where we're at today. I think that people are going to discover that there is tremendous amount of value in terms of looking at text. If you take a look at what goes on in corporations today, what goes on in corporations today is 10% is of the structured transaction variety and 90% is of the textual variety. And so, there's this world of opportunity, waiting for people who are willing to understand what the issues of text are. And so I think that we are like California in 1848. Once the world starts to discover the value in text, there's going to be a gold rush to go in and find that value.
Jonas Christensen 12:47
You're not going to be called out as a heathen here with that statement, Bill, because I couldn't agree more with you. I actually, I work as a head of data science in a big legal services firm. And guess what our data is? It's mostly text. And I often get asked by people in my network, typically people I worked with in banking, where a lot of it is not text. A lot of the data is digital transactions and they say ''Why would you be a data scientist in a law firm and what data science can you do in legal services?'' And I say ''It's all about the text''. There is so much rich information. It's more difficult to deal with but there is so much rich information there that is untapped. So you're preaching to the choir of one here at least. And Bill, this is a really, really interesting topic that I think we should dig deeper into: This textual analytics space. You've mentioned already that you are the founder, the chairman and the CEO of Forest Rim Technology, which is focused on this very problem.
Bill Inmon 13:43
That is very correct.
Jonas Christensen 13:44
Could you tell us what the company does and what problems you solve for your customers?
Bill Inmon 13:50
Surely, we solve a simple sounding problem, but it's terribly complex. And that's the problem of reading raw text and turning raw text into a database. Now, the first question people ask ''Well, why would you want to do that?''. So let me describe a little scenario to you. The other day, I was in my doctor's office and my doctor asked me what I did for a living and I told him ''I read text and turn text into a database''. And the doctor said ''Why would you want to do that?''. He says ''I've never heard of anybody doing that before''. And I said ''Well, you're probably right. There are people don't do that''. However, I said ''Let me ask you a question, sir''. I said ''When you have a patient with an unusual set of conditions, what do you do?'' And he says ''Well, I tried to find other patients and other doctors that have had similar conditions and tried to find out what works and what doesn't work''. And then I said to him ''And in your career as a doctor, how many patient records have you ever read to try to service one of your patients?'' and he says ''I don't know''. He says ''In the worst case, mayde 20 records''. And I said ''Well, how would you like to be able to read a million records?'' and he said ''But I can't read a million records''. And I said ''I know that. I know you can't, but the computer can read a million records. And don't you think your patient is going to stand a better chance of a good outcome by looking at a million people that have had similar conditions than 20 or 30?'' and he said ''Ah, now I understand why looking at a database is much more powerful than trying to manually read all of these documents''. And so, now there's more to it than that. Because it's not just a matter of reading the documents. One of the things that confronts people and confounds people when they do this off the bat is: When you're dealing with text, it's not enough just to look at text. You've got to also look at context. Because if you don't get the context of the text right, then you can't interpret what's being said properly. We have context in structured systems. When you take a look at transaction based systems, the context is buried in the Attribute Definition. We have an attribute for name. We have an attribute for address. We have an attribute for the dollar value or something like that. And so when we look at a structured record, we immediately know what the context is of what we're dealing with. But when you go to a textual record, you've got to go in and dig out that context because that context can mean pretty much anything. And again, you are not doing anybody any favours by looking just at text. You've also got to look at context. So the first issue is that database management systems have a hard time dealing with the unstructured format of text. The second thing is that your database management systems are not used to dealing with text that's got to be pulled from somewhere else. The database management system thinks that that text, that context already exists somewhere, and it doesn't. And so there's a lot of other complicating factors. Another complicating factor is that there are lots of languages around this earth. As important as the English language is, it's hardly the only language out there. There's Spanish, Portuguese, French, German, Russian. You name it and they're out there. That's another complicating factor. I can go on and on about complicating factors there. Let me just give you one more complicating factor. Another complicating factor is that when you're dealing with text, there's oftentimes much text that is of what you would call spam or blather. And so we did an analysis of an email stream for a company the other day and probably 40% of what people talk about on email is conversations about the local football team or the weather. And there's no business value whatsoever in talking about the football team. It's interesting talk, it's nice conversation, but you're not going to get anything from a business standpoint, out of that conversation. So there's a lot of complicating factors when you deal with text, but the lack of structure, the necessity for dealing with context are the two biggest factors.
Jonas Christensen 18:46
Yeah, so we can see how difficult this topic is to, sort of, attack right. And that's why it's getting left alone a lot of the time.
Bill Inmon 18:55
Yep.
Jonas Christensen 18:55
So you said 90% of the data is actually text, but 90% of the usage probably goes to the non-text or even more, maybe it's 95 or 99%, when I think about it. Bill, could you give us some examples of the value that we're missing out on by not analysing our text? What are some of the typical examples and use cases and outcomes that you see when people do get good at this stuff?
Bill Inmon 19:18
Well, let me give you a couple of examples. Let's go back to medical records. Medical records are designed for one doctor and one patient to treat one set of circumstances. And as far as that's concerned, they're all good. I mean, the medical record does a good job of that. But a medical record doesn't do a good job of, because it's in the form of text, is allow a company or organisation to look at, not one patient, but 10,000 patients. So when it comes to medical research, you can't do effective medical research by sitting down and manually reading text. You simply can't do it. And yet, in this day and age of COVID, I think everybody appreciates the necessity and value of being able to look at 10,000 patients and say: Okay, for the disease of COVID, what role does smoking play? What role does cancer play? What role does heart disease play? How about age and gender? What about race? All these things are really interesting things that are important to finding, doing good research on medicine. So, improving the health of the world: I'm not sure you can put $1 value on that. But I think that everybody recognises the need and necessity of improving the health in the world. Now, another one is hearing the voice of the customer. I can't tell you how many organisations I've talked to that never look out onto the internet and find out what people are saying about their company or their organisation. They say ''Oh, okay'' and I'm going to tell you right now, there's a wealth of informatio, that is simple to get to. As going to the internet, reading the internet and taking that information off of the internet. And yet companies don't do it. So that's just another case in point about the value of text that's going unrecognised and unlooked at today.
Jonas Christensen 21:31
So similarly, because it's a little bit harder to get to some of this information, we miss out on that aggregation. And I think that the healthcare system is a brilliant example of that. And you can actually see it in the way that we do research and medical trials. It's typically - it's based on relatively small sample sizes, not huge populations. It's never millions. It's maybe thousands. And sometimes hundreds of people that go through trials first and then we experiment. But there's so much we don't know about broader impacts. And when you mentioned something like the COVID pandemic that we've had, where we do have millions and millions of observations, it's just a different ballgame in terms of what we can do, if we could get to that medical record, that information. I was sitting next to someone from a university at a dinner, a couple of weeks ago, and she's a medical researcher in this field and one of her complaints was that most of the data that is underline medical research is actually based on the US population only, because that's where data gets collected most robustly. In other countries around the world, that's not the case. But there are so many factors that we're not capturing: Their genetics, and and you mentioned race and social background and what foods do you eat and what's the weather and all those sorts of things that could impact your health. So it's a broad topic now. I think I'm actually digressing a bit from the main topic here. Nevertheless, I thought it was an interesting point to make. Bill, so what are some of these typical challenges that are actually holding us back from getting value from textual data and how do we solve them? What's the more technical aspects of that?
Bill Inmon 23:11
It's taken me 22 years from the time that I started on this to get to where I'm at today. And it would be difficult for me to compress 22 years worth of knowledge into a few minutes here, but I'm going to try. What are some of the challenges? One of the challenge when you deal with medical records and this is actually a pretty simple challenge is the de-identification. That countries around the world have draconian measures that they take for organisations that don't protect the medical records privacy. I don't know if you know the story about how all that started. But let me tell you very quickly how this whole movement towards protecting medical records started. There was a company in the United States, a large insurance company in United States, that had a rogue employee that, for reasons known only to the rogue employee, went out and got the list of people that had HIV. And this was in the early days of AIDS. And this person published the list of people, in a public fashion, that had AIDS. That was a terrible thing to do. Listen, you asked me was that a terrible thing to do? You'd better believe that that was a horrible thing to do to people that should not have ever had that happen to them. However, the politicians in the United States saw it and they said ''Okay, we're going to fix that. We will make it so that people have to have their medical records de-identified''. And in doing so, yes, they did protect people from the malfeasance that had been done, but they also made medical research infinitely much more difficult. And then once the United States did it, then around the world, we have everybody getting all up in the arms about medical records. And again, I'm not saying that what was done in that insurance company was the right thing. Heaven forbid. That's not at all what I'm saying. But I'm saying that the politicians enacted policies that did great harm to doing medical research. So one of the challenges is how do you de-identify data and if you can't meet that challenge, then you shouldn't be doing what we're doing. But that's hardly the only challenge. The second challenge and this is the biggest challenge that we faced is when you go into a medical record, how do you know what to look for? And when we first started, we thought ''Well, people will give us some some medical records. We'll look at them. We'll find out what's important and will then create a database with that kind of information in it''. Well, we found out very quickly that organisations will not give you any kind of medical records to look at. So instead, we had to go to another source of informatio and we actually collected all of the medical records information that a person would be interested in. And I have my little list here. The medical specialties that we had to look at were: Pulmonary gastroenterology, orthopaedic nephrology, neurological haematology, dermatology, ophthalmology, ear nose throat, endocrinology, immunology, cardiovascular rheumatism, obstetrics, gynaecology urology, anesthesiology, paediatrics, geriatrics, oncology, psychiatry, sonography, radiology and diabetes. We went through the trouble of building a glossary of terms of if you were a doctor, writing the notes in any of those professions, what would you look at in terms of medicine, in terms of symptoms, in terms of history, in terms of treatment, in terms of medications, in terms of anything that was relevant to the medical record. So by creating this, ontology is what it's called, of all the comprehensive ontology of medical terminology, it means that we can now sit down and take anybody's medical record and read it. And that project took a long time. I'm gonna get off on a quick tangent, but it's kind of interesting tangent. I don't know if you've ever looked at how many words are in the English language, but there are estimated to be 660,000 words in the English language. Nobody knows for sure. I've never heard anybody claim they actually know for sure, but they estimate 660,000. It may interest you to note that of those 660,000 words, in the English language, 600,000 of them are in the terms of medicine. That the vast majority of words in the English language deal with medicine. My wife happens to be a doctor and she gave me the other day, a medical dictionary. Don't know if you've ever seen a medical dictionary, but it was this thick. And you start to thumb through it and there are words in there. I mean, you're recognising these words, but you have no idea. God help us. Doctors speak a different language. So when we went after building the ontology, for medical profession, we didn't attach 600,000 words. We only attach the words that were commonly used. Because, you see, doctors have at their disposal 600,000 words, but as best as we can determine from our sources, doctors use only about 20,000 - 25,000 words on a regular basis. And that in itself is a large obstacle. 600,000 words: We would not have been able to do that. 25,000: It took a lot of work, but we had that. So what that means is that now we can sit down and read a medical record and if it's in any of the medical specialties that we have, we know what the doctor is going to be saying. That's a brief rendition of some of the things. I'll tell you another little joyous thing that happens. The medical profession - and this includes everybody but especially the medical professionals - the medical profession has this habit of calling the same thing, different names. I happen to take a medicine called Furosemide but Furosemide is also known as Lasix. So when I go to my family doctor, he talks about Furosemide. When I go to the hospital, they talk about Lasix. But they're talking about the same thing. It's physically the same thing. It turns out that Furosemide got about 20 names. You can look it up on the internet and you'll find - I don't even know what the other names are. But that's another thing about medical profession. We were talking with the World Health Organisation just the other day. And they said ''Gee, Bill! When we look at how doctors in the UK talk, they have a different vocabulary than doctors in Germany''. Not language difference. I mean in terms of medication, and even procedures. So these different countries around the world have different vocabularies for the same thing and that makes medical research much more difficult to do than if they had a standard name for things. That's a little bit of why it's taking 22 years to get through these issues.
And there's so many subtleties in this textual analytics. You mentioned the context, not just the content earlier. And this is the problem that I and my colleagues, we face that every day. Because you know, the simple examples are: You can have a client that comes in with a knee injury, but they can also be labelled as knee damage or there could be a medical term for that or it could be phrased as leg injury or what have you. So there's so many hierarchies of these important labels and they can be used interchangeably. It's an incredibly rich source of information but also incredibly challenging to get to. And I think one of the problems, Bill, is that if we're used to using structured data, often in digits and numbers, you can sum them up, you can add them up. There are only the numbers from zero to nine that can be combined in different ways and structured in different ways, but they sort of have a system and a logic that doesn't get too out of hand. When we get to textual analytics, we run into all these limitations. Do we actually need to set a different benchmark for accuracy or what we're actually able to do in terms of getting things factually right? It's not like an accounting system, where you can say $10 went it and $5 went. There's much more scope and room for error and variability.
Let me go back into my own background. I came from a world of structured systems and data modelling. Even wrote a few books on data modelling. So I understand data modelling quite well. When I first got into the world of text, I was like a fish out of water. I was like a fish flopping on the bank, wondering ''Where's my stream? Where's My Water? I'm not used to this land stuff here''. And when you get into the world of text, the world of text operates differently than the world of structured information. There is no data model in text. There's something called a taxonomy, which is similar to a data model. But a taxonomy is not the same thing as a data model. So I'll tell you what it's like. It's like playing golf and trying to apply the rules of golf to soccer. It's like ''Okay, I'm a wonderful golfer. I know when I hit my driver. I know how to tee the ball up''. That doesn't do me any good at all, when I go on the pitch and try to make a goal. Whether I use the 9 iron or the 5 iron or whatever iron I use, that doesn't mean anything when I go to put my soccer shoes on. So the world of text is fundamentally different than the world of structured systems. And I think one of the things that happens is people that come from a background of data modelling and structured systems and numbers are befuddled by what they encounter when they get into text, because the things they've been used to are not there and they're not going to be there. But there are other things that are there that are important. So anyway, I understand what you're saying.
Yeah, I think there's almost a need to accept a higher degree of uncertainty in the textual data, in terms of what we typically would call accuracy that we've always got the true interpretation of the record. That's probably a fallacy in itself. But there is an easier path to verifying that when you have more structured data than this sort of very unstructured data. Bill, let's get into the detail of what you call the textual ETL because this is a finger bit of your invention at Forest Rim Technology. Could you tell us what textual ETL is, what its benefits are and how it differs from other text storage and analytical approaches?
Surely. Textual ETL is technology that reads text and turns text into a database. Textual ETL can read standard electronic text. It can read text from documents (OCR). It can take text from voice (VTT: voice to text transcription). So we don't care where you're at text comes from. It also can take text in multiple languages. It can take text in English, German, French, Spanish, etc, and so forth. We're just getting through adding Arabic to our list as well. So that's a little bit about textual ETL. Now what textual ETL does is it reads the text. And I can tell you, unequivocably that working with text is 10% of the battle. Finding and managing context is 90% of the battle context is so much harder to find than text is. So let me give you a couple of examples of context. Context is like an orchestra. In an orchestra, you've got flutes, you've got cellos, you've got violins, you've got piano, you've got drums. You've got all of these instruments. But in the very centre, you've got this conductor and the conductor is waving his hands and telling people, signalling to people ''Okay, it's your turn to play. Play louder, or do this and that and the other''. And context is the same way. Because the last time we counted, they're at least 67 different ways to infer context from text. Let me give you a couple of quick examples. If I were to say to you ''The Dallas Cowboys are in America. The Dallas Cowboys'', people would immediately think of a professional football team. That's what they think of the Dallas Cowboys. But if I were to have a document that had the word Dallas in it, just by itself, people would think of a city and three pages later, if I had the word cowboys in it, they would think of a person riding a horse on the prairie with a rope in his hand and a six-gun and a hat on. That's what they think of as a cowboy. And so, the proximity of words together changes the meaning of the word. Let me give you another example of what I mean by flutes playing and then violins playing. There's something called Homographic Resolution. If you were looking at doctor's notes, for example, and you were to see the term H/A, what does the term H/A mean? Well, if you were dealing with a cardiologist, H/A would refer to heart attack. If you were dealing with a general practitioner, H/A would refer to headache. If you were referring to an endocrinologist, H/A would be Hepatitis A. So in order for you to make an appropriate interpretation of H/A, you've got to know who wrote the document, because who wrote the document greatly influences your understanding of what's being said. And that's just the tip of the iceberg for understanding context. And what textual ETL does is: Textual ETL is like the orchestra conductor. Your textual ETL knows that there are these many different ways that have got to be combined together in order to understand context and so textual ETL says ''Okay, look at this kind of data. It's your turn to play. And then look at this kind of data. No, it's your turn to play''. And so that's what textual ETL does.
Jonas Christensen 38:19
So really, would it be fair to say, Bill, that there is not a one size fits all approach to text? It's almost always very topic specific. Let me elaborate on what I mean. So if you interpret structured data in a bank or a utility company or US Postal Service or Walmart or a supermarket chain, there's a lot of commonality in how we take that information. There's transactions. There's customers and so on. We structured in a certain way and it's sort of standardised. But here, you almost have to go and really understand the subject matter as well as the data itself and interpret that relative to the subject matter. So you cannot go and say ''The supermarket textual ETL is the same as the banking textual ETL'' for instance.
Bill Inmon 39:10
Absolutely. The first step that you do is take a sweeping look at the world that you're talking with. Because if you're talking about medical records, doctors talk one way. If you're talking about legal records, lawyers talk another way. If you're talking about retailing, retailers talk another way. So the first thing that you do is you say ''What ballpark am I playing it?''. Now there's more to it than that. But that's the first major step that you take.
Jonas Christensen 39:41
Yeah, so this is really such a complex topic. We knew that already but we're starting to really sort of highlight that, with concrete examples here. Bill, one of the things that you see often, one of the approaches you see often is that we tend to go ''Oh, this is too hard. Let's solve it by putting all this text into a data lake. Then the data scientists, they have it all there. The information is all there. It's unstructured and they can have a go at using natural language processing and other techniques to find the needles in the haystack in this data. Go ahead''. What are the challenges with that approach versus what you are doing? And why is this textual ETL approach different and better?
Bill Inmon 40:22
Well, NLP was never designed as a commercial product. NLP was designed to study language and trying to make NLP into a commercial product is like trying to take a racehorse and run them in the Indianapolis 500 with other automobiles. The race horse is never going to beat an automobile, unless, of course, the automobile dies. But so, NLP is very expensive, very cumbersome, very complex and doesn't produce the results that you need. It was never designed as a commercial product. Textual ETL, on the other hand, was designed as a commercial product. Textual ETL is simple to use, inexpensive and fast. And NLP is none of those things. NLP is not simple. It is complex and it's very slow. And by the way, data lakes: I just have to say a word about data lakes. Data lakes are the vendors creation to tell people they don't need a data warehouse. And I have to say, I get personally upset when a vendor says that because I know that they do need. The thing about putting data into a data lake is you can put it in there but you can't do anything with. It requires this enormous god awful effort to try to pull something out of the data lake house. So the data lake is a poor excuse for a vendor just wanting to sell their technology.
Jonas Christensen 42:03
So when we have to textual ETL, you have actually turned this unstructured data into structured data.
Bill Inmon 42:12
Yes.
Jonas Christensen 42:13
So you've taken the document found all the keywords that are relevant in there and then structured them, labelled them. So we've got columns and rows, like we would have in any other table. Is that pretty much how it works?
Bill Inmon 42:26
Yes.
Jonas Christensen 42:28
Hi, there, dear listener. I just want to quickly let you know that I have recently published a book with six other authors, called ''Demystifying AI for The Enterprise, A Playbook for Digital Transformation''. If you'd like to learn more about the book, then head over to www.leadersofanalytics.com/ai. Now back to the show.
This is a very technical question or it's a simple question with a very technical answer. But could you sort of give us a picture of how you decide what goes into that table? How do you identify that this is the context that's important here? Of course, there must be some stuff you don't bring in, that's kind of irrelevant, but nevertheless, is needed for the semantic meaning of a sentence. That's what you're trying to convert into rows and columns of labels.
Bill Inmon 43:20
I wish that I could show you because I could show you a picture of what it looks like. And the picture would explain more than I can explain. But let me at least try to explain. When people are used to structured data, they're used to having one definition of the data. And that definition means that all data is going to fit into that contextual structure you define, when you define a standard transactional structured database. We don't do that with textual ETL. With textual ETL, we have a single structure, but within the structure, you can create whatever relationships that you want. So you can say that aspirin is a medicine. You can say that the abdomen is part of the body. So there is a variable structuring of data inside the database. And for the world of text, as far as I can tell, it's got to be that way. That's not optional. It has to be that way. So the ability to create variable structures inside the text itself is a feature of textual ETL. And again, if I could show this to you, you would immediately say ''Ah, I understand what you're talking about now''. It's difficult for me to describe it.
Jonas Christensen 44:46
Yep. Fair enough. It is super complex. To finish off, before we get to the last few questions: What are the biggest opportunities you see for textual analytics?
Bill Inmon 44:57
Well, the biggest opportunities: It's kind of interesting, because we've had experience with different industries. Certain industries are interested in what their customer is saying. Other industries are not. And for example, banking and finance. I can't tell you how many bankers and financial people we've talked to. We have yet to find a banker that's actually interested in a customer. Bankers are only interested in money and getting more money. On the other hand...
Jonas Christensen 45:30
Hang on, that can't be right. That's not what they're saying, though.
Bill Inmon 45:36
On TV, they had these glorious ads on TV and they are absolutely - I'm not going to use the word that comes to mind - they're absolutely full of baloney is what they are. The other industry that doesn't care less about their customers is the airlines industry. Airlines, it's a wonder that anybody ever flies on an airline. On the other hand, there are other industries that do care about their customers. One of them is the restaurant business. If a restaurant doesn't care about their customers, they go out of business. And so restaurants have an appreciation for customers, that is very good. The other one that we found is the world of medicine, at least the doctors that we talk with and I talked to doctors almost every day. But the doctors we talk with, they do care about the health of the people that they're serving. And so some industries are very sensitive to what their customers are saying and other industries are not. But people ask me ''Gee, Bill, what industry would you would you recommend that I go into?''. I said ''Look for this. Look for an industry that's competitive. Don't look for an industry that a person signs up in the bank, and 40 years later, they're still at the bank. Go look for an industry where there's genuine competition in the industry. And when you find genuine competition, then you're going to find people that do pay attention to their customers''.
Jonas Christensen 47:07
Very good.
Bill Inmon 47:08
I didn't mean to insult you. I'm talking about banking and finance. But we've had a fair amount of experience in banking and finance. And I have yet to find a banker that truly cares about their customer. And I don't care what they say on TV.
Jonas Christensen 47:23
Yeah, that's right. I don't think you'll insult anyone there, Bill. I think it's good to hear that side of the coin. If there are any bankers listening out there, you can reflect on that comment. And Bill, I think we're towards the end of the show here. I have a couple of questions left for you.
Bill Inmon 47:39
Sure.
Jonas Christensen 47:39
One is one I always ask the guests on the show, which is to pay it forward to the next person. And we do that by suggesting guests for the show. So who would you like to see as the next guest on Leaders of Analytics and why?
Bill Inmon 47:54
Well, there are a couple of people. There is an Italian gentleman, named Francesco Puppini. I did a book with him last year. He's a most interesting fellow. He's one of the brightest people I've ever known. And I would recommend Francesco. He's an interesting fellow. He's Italian. But he lives in Poland. Except that, with the events happening in Europe, he's migrated at least temporarily from Poland back to Italy, because Poland is kind of a bad place to be right now. But Francesco is someone I would greatly recommen. In addition to being very smart, he's also very humorous and a good person to talk with.
Jonas Christensen 48:35
Well, he sounds like a brilliant guest for this show. So thank you very much, Bill. I will definitely be looking at Francesco. Lastly, Bill, where can people find out more about you and get a hold of your content?
Bill Inmon 48:47
Well, I have books that are out there. My books recently are published by a publisher called Technics Publication. Just look up Technics Publication and you'll find that. I also am very active on LinkedIn and you can find me on LinkedIn. And then I speak at conferences. I speak at probably two to three conferences a week over the Internet. In fact, about two months ago, on Monday, I spoke in China. On Wednesday, I spoke in Peru, and on Friday, I spoke in India. If I'd had to do that flying, I could never have done it. But through the magic of the internet, I can do that.
Jonas Christensen 49:29
The whole moving online during the pandemic must have changed your life quite a bit in terms of travelling, I guess.
Bill Inmon 49:35
I'll be honest. Heaven forbid that somebody should say something good about the pandemic. But it meant that I haven't gotten on an aeroplane now in three or four years. Man, it's so wonderful to not fly. It's the best thing ever happened to me.
Jonas Christensen 49:52
Yes, the pandemic was very tough and is still tough on us. But the good thing that came out of it was this newfound flexibility in how we do things, which I'm personally very grateful for.
Bill Inmon 50:02
Yeah.
Jonas Christensen 50:03
Bill Inmon, thank you so much for being on Leaders of Analytics today. It's been really wonderful to learn more about you and also to sort of have a primer on texture analytics. And I call it a primer, because I think we actually got into quite a little bit of detail, but it is such a huge area that you could probably talk about it for days and not run out of material.
Bill Inmon 50:26
Yes.
Jonas Christensen 50:27
I have personally gotten a lot out of it, because this is my daily challenge. So I'll have to go back and think twice about a few things that I'm doing. And I'm sure listeners have learned a lot too. So thank you so much for contributing to the show and all the best to you and Forest Rim Technology in the future.
Bill Inmon 50:44
Thank you, Jonas. I appreciate it.