Project Gutenberg is the world’s first digital library. For over 50 years, it has been steadfast in its commitment to provide free, unfettered access to digitized literature for everyone.
Run entirely by volunteers and driven purely by its mission to share public domain literature and information, Project Gutenberg stands out for how closely it hews to its altruistic vision, from creating e-books to audiobooks.
Now, they are using AI text-to-speech capabilities from Microsoft to accelerate the progress of creating an extensive audiobook library. They've already created over 5,000 audiobooks using AI, and through this process, they've demonstrated how cost-effective AI can be to help build a world where literature can easily be shared.
Link to full episode transcript.
Produced by Larj Media.
Speaker 0
Welcome to Pivotal. I'm Rayette Gallo, corporate vice president for commercial solution areas at Microsoft. I work with customers around the globe to transform their business through technology. At the center of every transformation are people who give technology its purpose. And that doesn't change with the advent of AI. It's actually being accelerated. People spark visionary ideas for leveraging technology. The release of AI technology like ChatGPT this year is exciting, but it has led to big question as we grapple with the best way to harness those tools to enhance and support the people behind the work. We like to talk about technology. I love to talk about it. But we often forget that technology is most effective when it supports people with purpose. This season will demystify AI by talking to the innovators using new AI technology to uplift their industries and augment their people, from education to journalism to surfing. And it just illustrates what AI is about. Everybody thinks it's about tech. No. Everybody's using AI. And that's what we're gonna show you on this season. Speaker 1
Project Gutenberg is the world's first digital library. For over fifty years, it's been steadfast in its commitment to providing free unfettered access to ebooks. Run entirely by volunteers, driven purely by its mission to share public domain literature and information, project Gutenberg stands out for the clarity and consistency of its altruistic vision. New AI text to speech capabilities have helped project Gutenberg accelerate progress and broaden its reach within the easy and relatively inexpensive creation of over five thousand audiobooks. Innovation very much aligned with their founding principle that a better world is one where literature can be shared. Speaker 2
Hello. My name is Greg Newby. I am director and CEO of the Project Gutenberg Literary Archive Foundation. That is the organization that operates Project Gutenberg. Speaker 1
Greg has been involved with Project Gutenberg for over thirty years. Speaker 2
I actually first learned about it when I was a student. I was a book. It was Alice's Adventures in Wonderland. It it was called the Millennium Fulcrum edition of Alice's Adventures in Wonderland, and I immediately got it. The Internet is a place where you can do all kinds of things, and part of that is to share content. And computers are fantastic devices for a variety of things. And part of what you can do on a computer is read. And if you have some content like a book, you can read the book, but also you can more easily search through the book or you can share the book. You could print the book if you wanted to. So right away, when I I'd never even heard of Project Gutenberg, but I got this copy of the ebook, I got it. You know, I understood what a transformative, thing this was to have electronic books, to have books on your computer. Speaker 1
It may seem obvious to us now in twenty twenty four. But looking back, I'm not sure people realize how transformational this would be, this information sharing through the Internet. Back then, they didn't have mobile phone e readers. I remember growing up, I was looking for books that may be out of print, and I had to go to a library and ask them to order it for me. And it would take months. And sometimes they would just say, sorry, you're out of luck. There is no printing this book. Can you imagine with e readers and the ability to now, with text to speech to get any literature to be actually digitized? This is so powerful. I would have loved that growing up. So Greg really understood the potential of the Internet earlier than a lot of people. Speaker 2
Fast forward a few years, and I was at the, University of Illinois as my first, faculty position, a young PhD, graduate teaching in the School of Information and Library Science there. So dealing a lot with books and content and and, thinking about the early ways that people could use the Internet for all kinds of things back in the nineties. This is before DSL, home cable, stuff like that. It was all in primitive days. It was actually even before Windows ninety five, which was what made the Internet much easier to use for many people. So I was, you know, an early user of of the Internet, an early adopter. Speaker 1
Not only was he an early adopter of the Internet, he was also ahead of the curve in thinking about AI. He actually wrote his master's thesis on the subject. Speaker 2
What I was thinking was, how could we use this communication model in an artificial intelligence environment so that the AI program would have sort of a human like understanding of the relationships among concepts. And I thought that this could be done, and I thought that one thing that would be transitional for artificial intelligence would be not for the AI only to be able to understand relationships among things, but to be able to understand the relationship between itself and those things. So my master's thesis dealt with a self-concept of artificial intelligence. In other words, a self-concept is some understanding of yourself and understanding of how yourself relates to other stuff out there in the universe. Speaker 1
It was during this time that Greg was introduced to Michael Hart, the founder of Project Gutenberg. Speaker 2
I was in my office, and my faculty colleague came in and showed me a, article in the newspaper that featured Michael Hart. And, it was just a local article that that talked about how he was trying to give away literature and how some people thought this was the greatest thing ever, and some people thought it was just totally wacky. And even in the library community, which I was part of, they're saying, wow. I don't know if this is a great idea to have free unfettered access to literature. Maybe they should have intermediaries. Maybe people should have librarians to guide them to the right book. Maybe, maybe not every book is for every person. Maybe there are some issues with having, things be free, and it will, you know, undervalue. And, of course, there are a lot of people saying, wow. You know, I look at my computer, and it's this green text on a black background, you know, a fixed font, eighty characters, pretty small screen, and what a crummy way to read a book. It's not like sitting in the tub or going to the beach. Computers and books just shouldn't match. Speaker 1
I have to laugh. It's just a good reminder of the skepticism and anxiety with new technology. There was a huge resistance to ebooks in the early two thousands. And now we don't even think about it. We just pull our ereaders. It's a great example of any new technology. There are always skeptical communities, even librarians. Speaker 2
When I learned that Michael Hart was the founder of Project Gutenberg, I called him up right away, introduced myself as a faculty member in in the university, and we lived in the same town in Urbana, Illinois. And, and we met up, and we became friends. Speaker 1
Greg's friend, Michael Hart, was not only the founder of Project Gutenberg, but the inventor of ebooks. His email signature read Internet user number one hundred. Michael had many interests from frisbees to hi fi music to operating a bike shop. He had one interest above the rest. Speaker 2
His real passion was giving away literature, making books digitized so that not only could you have a digital experience with advantages like being able to search the book and be able to print the book, but also the books could be given away infinitely. Speaker 1
I want to underline this because it's a radical shift in information sharing. Can you imagine how powerful this is? It was very funny, actually. I remember twenty years ago, I read a lot, and I didn't have enough space in my little house. So I kept fighting with my husband. Each time I bought a new book, it was like, we don't have the space. The e reader was the savior for me. Speaker 2
When you have a digital file, it's not like when you have a pie at your dinner table and you slice the pie and whoever gets a slice, well, that's a slice that someone else can't have. And if you have too many people and not enough pie, then everyone gets a smaller slice or or maybe someone gets no pie at all. Digital files aren't like that at all. You can copy them infinitely, and they're exactly the same. They're exactly as good. They cost, of course, almost nothing to copy and distribute. And so Michael had this idea of unlimited access to the world's knowledge, and he was gonna do it through literature, make it so everybody could have their own library. It would take just a corner of your your disk drive, or you could put it on a floppy disk or a, you know, a thumb drive. You could print it, put it on a CD, but everybody could have whatever literature they needed, unlimited access, whatever they wanted to do, and you can share that with the next person. So it's not just that that there's someone that's enriched with all this digital literature, but they can enrich others and and share it. And it's a pathway to literacy. Literacy. It's a pathway to education. It's a pathway to opportunity. Speaker 1
When Greg met Michael in the early nineties, they immediately understood each other. They were kindred spirits. When Michael decided to formalize project Gutenberg into an organization, he, of course, tapped Greg as the CEO, where he's remained for the last twenty four years. Speaker 2
We both had a major mission, a major drive to give free unfettered access to the world's information to to to people everywhere. Speaker 1
Greg's first introduction to ebooks was Alice in Wonderland. But the first ebook invented by Michael was the Declaration of Independence, produced on July fourth nineteen seventy one, was five kilobytes and was downloaded by six users. Speaker 2
When he gained access to a massive mainframe at the University of Illinois back in in around nineteen seventy one, he took that seriously. It was sort of a freebie account that his, friend gave him access to, and he said, wow. This is amazing. This is something that almost no one in the world has access to, but he knew how transformative computers were gonna be. And he said, what can I do with this super valuable computing resource? Maybe what I can do is I can leverage the power of the computer and the power of the network that the computer is connected to to, to enhance the human situation, to give people better opportunity to have access to information, to learn, to grow, to hear stories, and so he hit upon the idea of literature. This actually occurred serendipitously. It was July fourth. He had just been given his computer access. He was trying to figure out what to do with it and, was hanging around with his friends, you know, biking and playing Frisbee and that sort of thing. And in the afternoon or evening, he went out to the local quickie mart, and it was the fourth of July. So along with his bag of snacks, they put in a a facsimile, copy of the United States declaration of independence. And for him, that was the moment where Michael said, wow. I know what I'm gonna do. I'm gonna take this document, this really important document, declaration of independence. I'm gonna type it into the computer, and what that'll do is it'll make it so anybody can have their own copy of the declaration of independence. And because computers are really good at copying files and the networks are really good at getting them around, Everyone can have their own copy. He was thinking, wow. I can have a book that you just press a button and it just appears. There there it is. And in this case, it's in digital form. But, of course, he was also thinking forward to things like print on demand and, fax machines and and other technologies that would make this literature instantly available to anybody who wants it, and no or very little cost and infinitely able to be redistributed without deteriorating the original. Speaker 1
Greg paints a loving portrait of his longtime friend and colleague who passed away in twenty eleven. Greg is confident that Michael, a lifelong technophile full of curiosity and idealism, would be excited by the technological progress we've made in just the last few years. Speaker 2
The idea of artificial intelligence is one that he put a lot of, faith in back in the early part of the current millennium. He was very interested in machine translation. He was also very interested in the idea that computers would be able to do a lot of the sort of grunt work of scanning and typing in books themselves so that it wouldn't be as labor intensive to go from a printed book to an electronic version of that printed book. Speaker 1
In fact, he and Michael did work on audiobooks in two thousand four, but stopped after just a couple of years because the quality of the reading performance wasn't good enough. Speaker 2
Michael and I worked on audiobooks starting in the early two thousands. It was around two thousand two to two thousand four when we first started thinking of how we could do things other than just electronic books where it's just text. And what else can we do? We said, well, you know, audiobooks are popular. People are listening to audiobooks on, you know, on their m p three players and and, other devices of the day. Let's see if we can make audiobook versions of the Project Gutenberg literature and store those on our website and, you know, make them available for free download, unlimited redistribution just like the text. So we worked with a couple of different people to eventually do several hundred of the audiobooks. And, some of those are in human reading where one or more people read it into a microphone. And, of course, when people read, they can emote. They can do different voices. They can understand the book quite well, what they're reading. We also did several hundred that were text to speech audio. And this, in the day was considered to be pretty good, but it wasn't really all that good. The voices that that automate text to speech when the computer is reading the book and and speaking it with sort of a mechanical voice, not really all that great, you know, not nearly as good as a human reading the same the same book. Speaker 1
Greg and Michael understood then, but more people have come to understand now, that audiobooks are enormously valuable for accessibility for those with physical or learning disabilities and those who want to multitask. Since their early experimentation, the technology has improved. Speaker 2
When Microsoft came along with, their partners at MIT and elsewhere and said, hey. We'd like to, show you our cool new automated text to speech processing and see if we can get some Project Gutenberg content created for free unlimited redistribution as audio files, of course, immediately, I said, yes. This is this is great for us. No question about it. Speaker 1
So text to speech capabilities have existed for years. A lot of people are familiar with the famous physicist, Stephen Hawking, who used an automated text to speech system to communicate in his later years. But these new AI tools have been a game changer. Speaker 2
The new technologies that Microsoft brought to the table do a couple of different things, and we actually worked through multiple steps with the partners to eventually end up with around five thousand new computer generated audiobooks. The thing that is the most obvious when you listen to a modern book, like like we just produced with Microsoft versus the ones we did back in the early two thousands, is that the voice is so much more natural in how it reads. So you can still tell it's not a human. It makes a few errors around punctuation and proper nouns, and and sometimes the pacing isn't quite right. But you can hear that the pacing gets faster when it's an exciting section. You can hear when there's a quote, quote, when there's dialogue because the the nature of the voice changes a little bit. You can hear very obviously when there's an end of a paragraph or the end of a chapter because the computer knows sort of pause and then go on. And these are all characteristics that the earlier text to speech, botched. It didn't get it right at all. So you would have a very mechanical sounding voice that was not doing a good job of inflection, that was not really telling the story. Speaker 1
Listen to those three different clips from the audiobook, Call of the Wild by Jack London. The first clip is from the state of the art text to speech capabilities in two thousand four. Speaker 3
Buck did not read the newspapers, or he would have known that trouble was brewing, not alone for himself, but for every Tide dash water dog. Speaker 1
The second is a human reading from twenty twenty three. Speaker 0
Buck did not read the newspapers, or he would have known that trouble was brewing. Not alone for himself, but for every tidewater dog, strong of muscle with warm long hair from Puget Sound to San Diego. Speaker 1
The third is the AI text to speech from today. Speaker 4
Buck did not read the newspapers, or he would have known that trouble was brewing, not alone for himself, but for every Tidewater dog, strong of muscle and with warm, long hair, from Puget Sound to San Diego. Speaker 1
Pretty striking. Right? Well, I don't know about you guys, but I couldn't understand any of the two thousand four clip. I just couldn't. And I know I'm French, but I don't think it's my French. And I thought it was interesting when you contrast clip number two from twenty twenty three in a real human and the clip from twenty twenty four with AI text to speech. I actually thought the AI version was more human, which is very strange when you think about it. Can you imagine in two thousand and four, with the technology we had, it was just not possible to get the right quality? Speaker 2
So the thing that's most obvious these days is that the Microsoft technology combined with the training that they they did, they they had it trained automatically on a lot of books to try to understand the structure of the books and the the nature of the dialogue. All of this artificial intelligence work that went into the the parsing and understanding of the books is easily identifiable when you listen. Like, as soon as you listen, you say, oh, wow. This is this is much better. And it's better than, like, Siri or better than when you call into the airline. You can tell that it's a computer talking to you. But I think more importantly, you can tell that the computer has no understanding of what it's reading, you know, whether it's numbers or airline itineraries or or, you know, car rental agreements or or, you know, reports of blood work from your doctor. Like, you know, it's just reading the words one one after the other. With the new Microsoft technology, you can tell that the computer has been trained enough to recognize the structure of the literary content. So that's what's really obvious and really, really revolutionary from my point of view. Speaker 1
In order to get the full project up and running, there were a number of components that had to be built, beginning with selection. Project Gutenberg had about seventy thousand books, and they had to select a small sample to train the model. Speaker 2
In training, it's called unsupervised learning. What you're doing is you're saying basically to the computer, here's a whole bunch of stuff. Iterate through that. Read it over and over again. Look at it from different ways. Look at it with sentence level, word level, character level, paragraph level, look at similarities across different books because there are a bunch of things in the Project Gutenberg database that maybe would confuse the model. So you're asking the computer to sort of bootstrap itself to train itself. So we wanted to eliminate things that weren't English. Project Gutenberg has sixty languages, so we just worked with English. We decided to work only with literature. So we eliminated most, children's stories, picture books, things like that. We also eliminated sort of naturally the, previous text to speech audio files. We eliminated the couple of movies that Project Gutenberg has. We also have some content like the human genome product, you know, all those GATCs. So I wasn't gonna be very good at training the the computer to read literature. And the process for that was what's called a cluster analysis. And, again, this is a pretty well established statistical technique to look at statistically how similar are different books. Speaker 1
So they fed the seventy thousand books into the model to try and find those that are kind of similar using statistical clustering methods. And they identified around five thousands that were quite similar. They ended up with more traditional work, novels like Pride and Prejudice, Frankenstein, and The Importance of Being Earnest. And once the books were selected, they had to be parsed. Speaker 2
The files had to be read, and then they had to be regularized in their format so that the computer didn't have to have a lot of uncertainty. So they took in the HTML books. We have plain text. We have HTML. We have things like PDF and some other formats as well. So they took in the HTML books and looked at how the HTML markup helped to understand the structure of the books. So the computer can pick up, for example, pretty easily from the HTML where paragraph starts and ends because HTML has these little p markers, the letter p that says, here's a paragraph. And what are the headings? Well, there's h one for level one headings and h two for level two headings. So the computer could identify the structure of the, book fairly easily. And then, of course, there's quotation marks. Sometimes there are curly quotation marks. Sometimes there are straight quotation marks like you have on your computer keyboard. The curly ones are the ones that, curve in a little to the left or to the right that you often see in books. But the computer can understand that those are quotation marks and that there's some dialogue going on. So the the coarse grain structure of these five thousand or so books that were selected was similar and could be used, following some parsing and and a little bit of massaging to get the books in similar shape to identify the different components. Speaker 1
Once the books were parsed, they move on to training. The unsupervised learning where the supercomputer is iterating. Speaker 2
And here, what you're doing is you're telling the computer to tune its speech model over and over again inductively. In other words, based on the content, not based, deductively on someone saying, okay. Here's the way you pronounce Huckleberry. Here's the way you pronounce blueberry. Here's the way you pronounce sparrow. Right? So you can imagine that training the computer model to speak one word at a time and then saying, oh, by the way, if it's part of dialogue, maybe sparrow is, is said a a little different way. Or if it's at the end of a sentence with a period after it, then it's sparrow. And if it's in the middle of a sentence, it's sparrow. And if it's someone's name, maybe it's mister sparrow. Right? So you don't do it that way because if you did it that way, you'd never be done. Right? You you you there are too many combinations even even within, the limited scope of five thousand books. So you let the computer bootstrap. You let it train itself. So it's identifying inductively and iteratively that the structure of the documents and the wording of the documents have these types of characteristics. Speaker 1
The next step was to fine tune the text to speech model and give it some instruction. Speaker 2
So there is a point at which you say, okay. If there's something that's exciting, then we wanna go a little faster. If there's a question, then we wanna raise our voice at the end because it's a question. Is it? Yes. If it's a period or an exclamation point, how are we gonna handle that? So there is a little bit of training that went into the speech model about how to pace, how to read. Some of this came with the the software originally, and some of it was tuned based on the literature. So at this point, we can produce a book. And how long does it take to produce a book? Well, because computers are pretty fast and the the model is pretty fast, once we tell the computer to get to work and create a book, it might take just a few minutes to go from the printed book in HTML format to an m p three or similar audio file that you could put on your, you know, on your computer, stream over the Internet, you know, listen to on Spotify, something like that. Speaker 1
They were even able to experiment with different voices within the books by dividing each book into components and allowing computations to run simultaneously. Speaker 2
Microsoft Fabric technology was used to speed up this process. So what happens here is each book is divided into different components so that multiple compute jobs can do one component at the same time. So you might have one computer doing chapter one and one computer doing chapter two. And if there are ten chapters, you're gonna finish the book ten times faster. Right? Because you had ten ten jobs at the same time. Well, what if you do that at the same time for ten books? Then you have ten chapters at a time for ten books, and you're gonna be done a hundred times faster. So this is a approach that the Microsoft Fabric was used to speed up the whole process to do a divide and conquer of the content and then apply basically the same procedures, the same computer programs against different chunks of the content. And then, of course, after that's done, you have to reassemble the content. So you have ten jobs doing ten chapters of the book, then you have to have, another job afterwards that reassembles those ten, chapters. So this was where the scaling up really really worked very well and made us not only able to very quickly generate the five thousand, but also iterate. So they could say, okay. Let's try it this way. Let's try it that way. Let's change this rule. Let's try it with a woman's voice. Let's try it with a man's voice. Let's try it with someone with an English accent as the voice. So you can do a couple of different variations or or really a number of different variations in fairly short time. Speaker 1
When all is said and done, the compute time for each book takes just a few minutes, and the cost is quite low. Speaker 2
The relative cost of doing all this processing is pretty low on the order of dollars for the whole collection as opposed to thousands or hundreds of thousands of dollars. And, again, to get done in, you know, at the most a few hours as opposed to an individual reading that same book, well, they take they take fifteen, twenty hours just to do one book, let alone doing a whole library. So the speed up was very impressive. Speaker 1
How cool is this? Now we're talking minutes versus hours. You don't need a human to go narrate the book. And this is just amazing because you'll be able to spread the world's knowledge much faster. Running a podcast myself, I know how much effort it takes to record a podcast. So I'm actually really impressed with this because it really gives us the ability to get books in minutes being accessible to all. And if we had to really have humans do all of those, I'm not sure we'll get many books. That would be a shame. Then we arrived at the last part of the process, which is to make those audiobooks available to all. Speaker 2
How are people gonna read it or or listen to it? And the answer there was, again, a little software, a little, research to integrate with the major cloud service providers for audio services. So getting the books out there was something else because, you know, an additional important step because why are we doing this? We're doing it so people can listen. So how are people gonna listen? Well, they're gonna discover it in a variety of ways. And we don't wanna say, you know, come just to our website or fill out a form or or we'll send you a CD in the mail. Like, you know, that wouldn't be too satisfying. So we wanted to make it very easy for people to get these, audiobook files and download them. And, the one thing that people might notice is if you're on Spotify, that's a tool that's not designed to let you redistribute. You can't actually take your audiobook and give it to someone next to you in Spotify like you can with Project Gutenberg books. However, you can go back to one of the other publication sites like Internet archive and say, hey, you know, friend, follow this link, and you can get your own copy of that book. So Spotify ends up being a great as an example, ends up being a great way of, enabling the streaming and the nice features that Spotify offers. So we had to have multiple platforms so that these audiobooks could be redistributed freely without any hindrance. The Microsoft folks working with MIT and other partners did a great job at making sure that these five thousand audiobooks would be readily available on all the major sound platforms. Speaker 1
People can listen to five thousand new audiobooks available for free on major streaming platforms like Spotify and Apple Podcasts, Then share them with friends on publication sites like Internet Archive. Now Greg is thinking ahead to the next challenge. Speaker 2
We hope that we'll have some future work together with Microsoft and MIT and partners that would improve some types of literature. The example that I like the most is a play. We had a whole bunch of Shakespeare, have a whole bunch of Shakespeare in Project Gutenberg. What if we could use the computer text to speech processing to have different voices for the different characters? Wouldn't that be super cool? And the answer is yes, it's super cool and they demonstrated it, but we didn't actually do it for what's out there in the public because it wasn't working quite well enough. So there's opportunity for improvement generally, of course. Like, you know, you can always do better, but also, you can have improvement for, particular types of literary works, and that's something I'm pretty enthusiastic about. Not only things like Shakespeare and plays, but also I really would love to do the children's books. And one of the great examples that we had, we actually demonstrated this at a conference at the Internet Archive. You can choose a voice, So standard voices, there's someone with a British accent. There's a woman's voice. There's a man's voice. There's deeper voices or higher voice. You can choose one of twelve or so voices. You can also speak into a microphone, and then your voice will be used to make the books. How cool is that? So the idea there among other things was, well, what if we could have a parent speaking their voice into the computer and then we generate a children's book, and they can use that audio book recording of the children's book as, like, a bedtime type of thing. It's like, well, dad's still at work, but you can listen to his voice reading Beatrix Potter, something like that. So that's an example of some of the future work that we hope to do. Speaker 1
So how long does Greg think this work will take? Speaker 2
It's really hard to predict how long things will take. We've had such a rapid step forward in artificial intelligence in the last few years, and suddenly, we're talking about AI, which in the olden days, we would call strong AI. Strong AI is general purpose artificial intelligence that can do a bunch of different tasks and and to some extent, simulate how people would perform on those tasks. Until a few years ago, most AI was what we called weak AI, which meant that it was good and just a very narrow subject domain and and good. Right? Good, effective. We had things called expert systems. We had things called recommender systems. Obviously, we had things called search engines. These are all examples of weak AI. But these days, we're dealing with strong AI, and I think that strong AI was necessary to have the emotiveness for, for reading back electronic books with with really good text to speech. You know, that do I understand how to do inflection better, to understand how to do do dialogue better? So I think that that becomes kind of a solved problem, and we're looking at tuning that so that it, the the text to speech understands better the structure of literature, understands better, whether it's something exciting is happening or something sad is happening and changes the the vocal inflection appropriately. So I think we're probably looking at an asymptotic, pathway. In other words, you get closer and closer to perfection, which would be a human reading in our view of the world. But you can only get so close. You're always improving. So we've had these tremendous improvements recently, and there's still improvements to be made, but it's probably gonna be an ongoing iterative process of getting better and better. Speaker 1
Greg is already imagining other applications of AI for text to speech in the near future. Speaker 2
Maybe we'll see a, big quantum leap like we have in other areas of artificial intelligence. So for example, a recommender system or a system to find the book that you really wanna find and then maybe instantly reads it for you based on criteria that you provide. Right? So we can look at places where the domain, of artificial intelligence would be not just for creating a a great new text to speech audiobook, but also finding the right book or also maybe extracting from that book. So you can ask a question like, what was it that Alice said to the queen in Alice's Adventures in Wonderland? And it could speak back to you and and maybe even a little like our current AI tools, maybe rephrase that, maybe explain it. Maybe you could ask. You you're reading a more difficult book. You're reading, James Joyce Ulysses, and you say, hey. I'm reading this passage, and I'm having difficulty understanding it. And the AI will say, okay. Let's read the passage with inflection that might help you to understand what's going on, and pace pacing to help you understand what's going on. And, also, maybe we can bring in some literary analysis of James Joyce Ulysses. Maybe we can point you out a YouTube video that that talks about this passage or this book. So I think we can see where where the bundle of technology could, could work together in ways that are go well beyond, simply processing the text into, audiobooks.
Speaker 1
Greg sees the advancement of text to speech technologies extending well beyond ebooks and audiobooks.
Speaker 2
One of the main things that the audiobooks that we're talking about today have as an advantage over previous generations of text to speech, whether it's audiobooks or other other purposes, is it's much more interesting to listen to. Like, the pacing, the emotiveness is engaging in ways that a very flat, uninflected voice is not. So, really, anything that you might have that you might wanna communicate to people that where it's in written form, but you wanna communicate people, get them to understand it, get them to engage with it. Well, audio might be a very good way of doing that. So if you have a just a regular website or you have a document, maybe it's a, I don't know, a sales proposal or presentation, but something that people would normally interact with maybe visually with words, you can say, let's, let's add the voice to this. Something really boring like an end user license agreement that you're trying to communicate. You could imagine pairing that with a little artificial intelligence assistant that reads certain passages or responds to your questions contracting department needs to understand a offered, contractor license for product or service they're considering. Well, could the AI assistant, again, using inflection, but also using things that are outside of the text to speech, so looking for the right passages to respond to your questions or to align with your corporate policies, sort of flag and then read back to you what's relevant. So I can imagine where the the text to speech is a way of engaging the auditory sense as a way of enhancing your ability to absorb and understand the the content. Not necessarily a way of entertaining, not necessarily of of, only being the way or being the only way that you're interacting with the content, but a way of increasing understanding.
Speaker 1
On a personal basis, I know for a fact that I prefer audio and video to actually reading a document when I'm trying to learn or when I'm trying to digest new information. As an example, in my job, I get to work with customers around the world. And before I meet them, of course, the account teams are supposed to share their perspective. And what I learned is if I ask my teams to actually create a video and speak for five minutes, just five minutes, I get much more insights than if I ask them to actually write it down. It's actually interesting to see how for people, if they have to speak about something, the level of texture you get is so much more than if they just wrote it. Utilizing those new text to speech AI tools aligns closely with their mission, to share and distribute for free as much of the world's knowledge as they can. Adding five thousand audiobooks to their collection boosts accessibility to a broader community, including people who have visual impairments and those without access to traditional libraries. So imagine the possibilities. Right? Think about schools. How many times have we read about schools who couldn't afford to get new textbooks and were actually teaching things that were no longer true in this world? I remember seeing a show on TV where they were showing schools in New York where the kids still had textbooks with USSR being mentioned. And we've been out of USSR for years. So it's just I think it's a great way to make the new textbooks, the new way of learning accessible to all those kids. And that's the power of this technology. And then the other thing to think about is a lot of content is out of print, literally. It's like but it doesn't mean that things that were written thirty, a hundred years ago are not relevant. They are. And, actually, It's not because we can't make money out of it, that it means it's not relevant to all of us. People are really taking advantage of this technology. Think about Le Monde. It's a very well known publication in France. What they've been doing is turning every one of their articles into podcasts because people want to consume it listening instead of reading. And they're doing that at scale, and they're doing that cost effectively because of the technology. There's another startup called Natural Reader, and their job is actually to think about how they make the voice more human so that people can actually connect to it. There is so much possibility here, and people are not waiting. They're doing it. That's why I'm super excited about this technology. Everyone wins.
Speaker 2
The fact is that without literature, without reading material, you cannot become literate. In the case of Project Gutenberg, you're getting that literary experience without necessarily needing to read, without needing to be, literate in written form. What I think we can imagine here is people without great access to literacy tools, teachers and tutors and textbooks and schools, or or that have whatever limitation, whether it's an access limitation, disability, or just that they're poor, they have to work, you know, all these these reasons why people have difficulty getting themselves into learning situations and thriving there. Well, audio is one additional way. Right? And it's and it's one additional way that that has, I think, proven value. So anything that we do to either make some of that reading outcome available just in audio form or use the audio to ignite or encourage or support. We don't wanna say, alright. The answer to literacy is everyone must read Pride and Prejudice. Because if you don't read Pride and Prejudice, you're just not literate and you don't belong in society and and you're not gonna be able to go to college and all that stuff. Like, we'd never say anything like that. What we wanna do is find a book. Enjoy a book. You don't like that book? Find a different book. You really like that book? Share that book. You're not too serious about that book? Skim it. Read the, you know, read the chapter heading and skip ahead and find out how how it ends, or follow the link in our catalog, and you can read the Wikipedia version of that of that, of that book or or, about that author. Right? It's all good. It's all good. And so what what the audio does is it gives just another opportunity for people to experience the literature and hopefully to become more literate, a little bit more educated, and we hope that that leads to, opportunity for individuals as well.
Speaker 1
It seems obvious that we would wanna go after free literature and advancing literacy, but Greg has encountered his share of resistance.
Speaker 2
Project Gutenberg has a history of skepticism. Way back in the seventies and when I met Michael in in the, early nineties, there were a lot of people that were saying, not only is this a stupid idea, but it's a bad idea. In other words, it's stupid because no one's gonna wanna read books on computers. There's the you know, you can't read them in the bathtub. There are all these challenges. And it's a bad idea because literature should be experienced purely. It should be experienced in English eleventh grade class or sitting on a couch with a maybe with a pipe and a and a little whiskey sniffer or some something like that. The snobbishness that Michael encountered was just incredible. And incredibly, even more, a lot of that came from the library community. In other words, the same people that you would think were gonna be all about content and freedom of information. But, you know, this was the seventies, eighties, nineties. We didn't have the concepts we have today about unlimited access to information. We didn't have the idea that redistribution in itself was was a good thing. These were concepts that really emerged later. Speaker 1
Although Project Gutenberg had a history of skepticism, most people today welcome this expansion, from project volunteers to users far and wide. It's a worthy cause, facilitating free access to a digital library of information, literature, research, plays which are not under copyright. And because Project Gutenberg is staffed entirely by volunteers and these new AI text to speech ebooks align with their mission, they don't face the same internal headwinds that larger companies may be facing as they introduce AI. The main source of anxiety comes from those who worry that this AI tool will take their jobs. Especially in the wake of last year's rider strike in the entertainment industry, there is particular anxiety from actors and writers about how AI may disrupt their industry. Speaker 2
Statistically, very few books make money. Very few books stay in print for more than a few years. And commensurately, very few books are going to get the level of human effort involved to make an audio book. There's a lot of books out there that don't have audiobooks available. So there's certainly a chance that that people producing the companies producing commercial books will decide to use text to speech technology rather than, you know, rather than human actors. But I don't think they're gonna do that for the most popular stuff because it's just it's not it's gonna be a while, if ever, before the automated text to speech is as good as that human. And even if it sounds as good, maybe people wanna hear the author reading their own words, or they wanna hear some famous actor reading their own words, or they wanna have all the additional sound effects and and other things that come into, you know, to play for some audiobook. So I think there's always gonna be an advantage or an opportunity to have the humans doing the work, and the computers can do the work that the humans probably weren't gonna do anyway and, and do that at quite an affordable cost. So I think that's there's there's still a niche. We're we're not having the computer do work that a person would have otherwise done as their job. Certainly, in the case of Project Gutenberg, most of the works that we did with those five thousand audiobooks, they they were they were probably never gonna be read by humans, you know, put put through a a human performance process. So so in other words, we we get something at, relatively low effort. And the alternative was to say, well, we'll get nothing because we don't want the computers reading the books because maybe some human will read it eventually. There's a lot of expediency and a lot of, opportunity that would be lost if we said, let's let's wait for a human to come along and read that audiobook. Speaker 1
This comes up a lot when people express anxieties around using AI and this fear that it will take their jobs. But what Greg is saying is so important. There is always a nuance. And in the case of Project Gutenberg, most of those books would have never received human narration. This is actually filling an unmet need. From writing a master's thesis on the self-concept of AI, to leading a nonprofit dedicated to literacy and free information. Harnessing these latest innovations to spreading knowledge and access feels like a full circle moment. In this conversation, it's almost like Greg, as a master's student, is the third voice in the room, as he reflects on the last couple of years. Speaker 2
Self-concept is is, some understanding of yourself and understanding of how yourself relates to other stuff out there in the universe. Fast forward decades to, today, and you can interact with ChatGPT and the other tools along those lines. And they have somewhat of a self-concept built in. So you can say, well, tell me about yourself. And they'll say, well, I'm a a large language model and designed by such and such and trained on such and such. But you can't ask them a lot of detailed questions. So you can't you can't really say, tell me your opinion of armadillos. You know, tell me what you think of spiders. You know, are you afraid of the dart? Right? I mean, you'll get answers because you always get answers from these AI systems, but they're not gonna be anything like an answer that a person would give because they don't not only because they don't have those experiences because they're not people. The AIs of today, you know, don't have that self-concept relationship to the world. So I think we still have a way to go. So so when I say that back in the eighties, strong AI with or even AI in general was always impossible. It was always out of reach. It was always around the corner. You know, we saw movies. We we didn't know if they would be like in the movies or not, but we had a notion as to what an AI was gonna look like, and they were always, you know, in the future. These days, those have arrived. So the the, the type of AI that we can interact with in the, twenty twenties, they're pretty good, but they're they're not like people. Like a c three p or a HAL nine thousand, they actually did have more of a self-concept than you'll see in today's AI. So I think this is still a a frontier. You're not gonna have a substitute for the latest author for a great songwriter. They're approximating it. They're mimicking it. In the case of, Project Gutenberg, what we like to see is where AI does what it actually really is good at, which is replacing some of the hard work of people that they aren't necessarily that good at and maybe don't enjoy and are inaccurate at, having the computers do that instead. So I think we see the opportunity here where using current day technology, not thinking some science fiction future, but using current day technology, the artificial intelligence tools can take on some of the hard and exacting work that people do that people don't generally enjoy doing or wouldn't enjoy doing at scale, you know, over and over again, and do that job at, at less cost and more efficiently. Speaker 1
I just love this story. You know, everybody is afraid of AI and how it's gonna take away jobs, take maybe the human side of things away from whatever it is. And when you think about this story, it's actually technology bringing back to us things that we couldn't get if it didn't exist. In two thousand four, they couldn't build those audiobooks. Now text to speech technology comes together with AI, and now we can create those amazing books with the right tone that feels very human, and we can get access to things that we can't get access to today because those things are out of print. Nobody's printing them, and now everybody can get them. I just love it because it's showing that technology can do good for us if applied to the right problems and drive the right outcomes. I love this story because in this case, technology is really applied for good. It's actually addressing an unmet need. So So it's very exciting. If you think about it, in two thousand four, they wanted to work on audiobooks. They just couldn't get it done. And now we can, and we can do it very fast. Thirty seconds per per book. We could have never found humans to go narrate those books, And now we could do it with a model in thirty seconds. They just started. They have five thousand books, and I'm sure soon we'll be at seventy thousand books. Can you imagine? We're just making all this content accessible. And if you think about it, a lot of things are out of publications. And we need this. Like, why wouldn't you want to have access to things that were written ten thousand years ago, a thousand year ago, hundred years ago. You wanna access all this content. And because it's out of print, you actually can't access it. I don't accept that. So I'm very excited about Project Gutenberg because he's bringing that back. And I when I think about my own kids, I'd love for them to access that content. Can you imagine when I grew up, I could see some content because it was in print. Thirty years later, your same kid cannot access that content. Why is that acceptable? It's not. That's why this project is so important. We need many more of them. And imagine the possibilities now that we have this book digitized. What's the next step? What could we do? Imagine students being able to actually question as they're reading the book, making this completely interactive. My son was just studying a play from Shakespeare. And half of the time, he was like, what are they talking about? So we had to go back, talk to the teacher, or we had to go online and try to find it. Imagine if you could have that embedded within the experience, within the book. And actually say, what do we think you're saying? I love that. When I'm reading, at times, I read a book and I'm thinking, are they saying this or that? And having that dialogue as you're reading this book would not make your experience much richer. I think so. It would for me. Speaker 0
Thank you for listening to Pivotal. I'd love to hear your story and your pivotal moment. So don't hesitate to follow me and share on LinkedIn. All this information is also available in the show notes. Our show is produced by LARJ Media. That's L A R J Media. Special thanks to Lin Yang and our partners at We Communications.