Motor neuron diseases took their voices. AI is bringing them back.
www.technologyreview.com
Jules Rodriguez lost his voice in October of last year. His speech had been deteriorating since a diagnosis of amyotrophic lateral sclerosis (ALS) in 2020, as the muscles in his head and neck progressively weakened along with those in the rest of his body. By 2024, doctors were worried that he might not be able to breathe on his own for much longer. So Rodriguez opted to have a small tube inserted into his windpipe to help him breathe. The tracheostomy would extend his life, but it also brought an end to his ability to speak. A tracheostomy is a scary endeavor for people living with ALS, because it signifies crossing a new stage in life, a stage that is close to the end, Rodriguez tells me using a communication device. Before the procedure I still had some independence, and I could still speak somewhat, but now I am permanently connected to a machine that breathes for me. Rodriguez and his wife, Maria Fernandez, who live in Miami, thought they would never hear his voice again. Then they re-created it using AI. After feeding old recordings of Rodriguezs voice into a tool trained on voices from film, television, radio, and podcasts, the couple were able to generate a voice clonea way for Jules to communicate in his old voice. Hearing my voice again, after I hadnt heard it for some time, lifted my spirits, says Rodriguez, who today communicates by typing sentences using a device that tracks his eye movements, which can then be spoken in the cloned voice. The clone has enhanced his ability to interact and connect with other people, he says. He has even used it to perform comedy sets on stage. Rodriguez is one of over a thousand people with speech difficulties who have used the voice cloning tool since ElevenLabs, the company that developed it, made it available to them for free. Like many new technologies, the AI voice clones arent perfect, and some people find them impractical in day-to-day life. But the voices represent a vast improvement on previous communication technologies and are already improving the lives of people with motor neuron diseases, says Richard Cave, a speech and language therapist at the Motor Neuron Disease Association in the UK. This is genuinely AI for good, he says. Cloning a voice Motor neuron diseases are a group of disorders in which the neurons that control muscles and movement are progressively destroyed. They can be difficult to diagnose, but typically, people with these disorders start to lose the ability to move various muscles. Eventually, they can struggle to breathe, too. There is no cure. Rodriguez started showing symptoms of ALS in the summer of 2019. He started losing some strength in his left shoulder, says Fernandez, who sat next to him during our video call. We thought it was just an old sports injury. His arm started to get thinner, too. In November, his right thumb stopped working while he was playing video games. It wasnt until February 2020, when Rodriguez saw a hand specialist, that he was told he might have ALS. He was 35 years old. It was really, really, shocking to hear from somebody you see about your hand, says Fernandez. That was a really big blow. Like others with ALS, Rodriguez was advised to bank his voiceto tape recordings of himself saying hundreds of phrases. These recordings can be used to create a banked voice to use in communication devices. The result was jerky and robotic. Its a common experience, says Cave, who has helped 50 people with motor neuron diseases bank their voices. When I first started at the MND Association [around seven years ago], people had to read out 1,500 phrases, he says. It was an arduous task that would take months. And there was no way to predict how lifelike the resulting voice would beoften it ended up sounding quite artificial. It might sound a bit like them, but it certainly couldnt be confused for them, he says. Since then, the technology has improved, and for the last year or two the people Cave has worked with have only needed to spend around half an hour recording their voices. But though the process was quicker, he says, the resulting synthetic voice was no more lifelike. Then came the voice clones. ElevenLabs has been developing AI-generated voices for use in films, televisions, and podcasts since it was founded three years ago, says Sophia Noel, who oversees partnerships between the company and nonprofits. The companys original goal was to improve dubbing, making voice-overs in a new language seem more natural and less obvious. But then the technical lead of Bridging Voice, an organization that works to help people with ALS communicate, told ElevenLabs that its voice clones were useful to that group, says Noel. Last August, ElevenLabs launched a program to make the technology freely available to people with speech difficulties. Suddenly, it became much faster and easier to create a voice clone, says Cave. Instead of having to record phrases, users can instead upload voice recordings from past WhatsApp voice messages or wedding videos, for example. You need a minimum of a minute to make anything, but ideally you want around 30 minutes, says Noel. You upload it into ElevenLabs. It takes about a week, and then it comes out with this voice. Rodriguez played me a statement using both his banked voice and his voice clone. The difference was stark: The banked voice was distinctly unnatural, but the voice clone sounded like a person. It wasnt entirely naturalthe words came a little fast, and the emotive quality was slightly lacking. But it was a huge improvement. The difference between the two is, as Fernandez puts it, like night and day. The ums and ers Cave started introducing the technology to people with MND a few months ago. Since then, 130 of them have started using it, and the feedback has been unremittingly good, he says. The voice clones sound far more lifelike than the results of voice banking. They [include] pauses for breath, the ums, the ers, and sometimes there are stammers, says Cave, who himself has a subtle stammer. That feels very real to me, because actually I would rather have a synthetic voice representing me that stammered, because thats just who I am. Joyce Esser is one of the 130 people Cave has introduced to voice cloning. Esser, who is 65 years old and lives in Southend-on-Sea in the UK, was diagnosed with bulbar MND in May last year. Bulbar MND is a form of the disease that first affects muscles in the face, throat, and mouth, which can make speaking and swallowing difficult. Esser can still talk, but slowly and with difficulty. Shes a chatty person, but she says her speech has deteriorated quite quickly since January. We communicated via a combination of email, video call, speaking, a writing board, and text-to-speech tools. To say this diagnosis has been devastating is an understatement, she tells me. Losing my voice has been a massive deal for me, because its such a big part of who I am. Joyce Esser and her husband Paul on holiday in the Maldives.COURTESY OF JOYCE ESSER Esser has lots of friends all over the country, Paul Esser, her husband of 38 years, tells me. But when they get together, they have a rule: Dont talk about it, he says. Talking about her MND can leave Joyce sobbing uncontrollably. She had prepared a box of tissues for our conversation. Voice banking wasnt an option for Esser. By the time her MND was diagnosed, she was already losing her ability to speak. Then Cave introduced her to the ElevenLabs offering. Esser had a four-and-a-half-minute-long recording of her voice from a recent local radio interview and sent it to Cave to create her voice clone. When he played me my AI voice, I just burst into tears, she says. ID GOT MY VOICE BACK!!!! Yippeeeee! We were just beside ourselves, adds Paul. We thought wed lost [her voice] forever. Hearing a lost voice can be an incredibly emotional experience for everyone involved. It was bittersweet, says Fernandez, recalling the first time she heard Rodriguezs voice clone. At the time, I felt sorrow, because [hearing the voice clone] reminds you of who he was and what weve lost, she says. But overwhelmingly, I was just so thrilled it was so miraculous. Rodriguez says he uses the voice clone as much as he can. I feel people understand me better compared to my banked voice, he says. People are wowed when they first hear it as I speak to friends and family, I do get a sense of normalcy compared to when I just had my banked voice. Cave has heard similar sentiments from other people with motor neuron disease. Some [of the people with MND Ive been working with] have told me that once they started using ElevenLabs voices people started to talk to them more, and that people would pop by more and feel more comfortable talking to them, he says. Thats important, he stresses. Social isolation is common for people with MND, especially for those with advanced cases, he says, and anything that can make social interactions easier stands to improve the well-being of people with these disorders: This is something that [could] help make lives better in what is the hardest time for them. I dont think I would speak or interact with others as much as I do without it, says Rodriguez. A very slow game of Ping-Pong But the tool is not a perfect speech aid. In order to create text for the voice clone, words must be typed out. There are lots of devices that help people with MND to type using their fingers or eye or tongue movements, for example. The setup works fine for prepared sentences, and Rodriguez has used his voice clone to deliver a comedy routinesomething he had started to do before his ALS diagnosis. As time passed and I began to lose my voice and my ability to walk, I thought that was it, he says. But when I heard my voice for the first time, I knew this tool could be used to tell jokes again. Being on stage was awesome and invigorating, he adds. Jules Rodriguez performs his comedy set on stage.DAN MONO FROM DART VISION But typing isnt instant, and any conversations will include silent pauses. Our arguments are very slow paced, says Fernandez. Conversations are like a very slow game of Ping-Pong, she says. Joyce Esser loves being able to re-create her old voice. But she finds the technology impractical. Its good for pre-prepared statements, but not for conversation, she says. She has her voice clone loaded onto a phone app designed for people with little or no speech, which works with ElevenLabs. But it doesnt allow her to use swipe typinga form of typing she finds to be quicker and easier. And the app requires her to type sections of text and then upload them one at a time, she says, adding: Id just like a simple device with my voice installed onto it that I can swipe type into and have my words spoken instantly. For the time being, her first choice communication device is a simple writing board. Its quick and the listener can engage by reading as I write, so its as instant and inclusive as can be, she says. Esser also finds that when she uses the voice clone, the volume is too low for people to hear, and it speaks too quickly and isnt expressive enough. She says shed like to be able to use emojis to signal when shes excited or angry, for example. Rodriguez would like that option too. The voice clone can sound a bit emotionally flat, and it can be difficult to convey various sentiments. The issue I have is that when you write something long, the AI voice almost seems to get tired, he says. We appear to have the authenticity of voice, says Cave. What we need now is the authenticity of delivery. Other groups are working on that part of the equation. The Scott-Morgan Foundation, a charity with the goal of making new technologies available to improve the well-being of people with disorders like MND, is working with technology companies to develop custom-made systems for 10 individuals, says executive director LaVonne Roberts. The charity is investigating pairing ElevenLabs voice clones with an additional technology hyperrealistic avatars for people with motor neuron disease. These twins look and sound like a person and can speak from a screen. Several companies are working on AI-generated avatars. The Scott-Morgan Foundation is working with D-ID. Creating the avatar isnt an easy process. To create hers, Erin Taylor, who was diagnosed with ALS when she was 23, had to speak 500 sentences into a camera and stand for five hours, says Roberts. We were worried it was going to be impossible, she says. The result is impressive. Her mom told me, Youre starting to capture [Erins] smile, says Roberts. That really hit me deeper and heavier than anything. Taylor showcased her avatar at a technology conference in January with a pre-typed speech. Its not clear how avatars like these might be useful on a day-to-day basis, says Cave: The technology is so new that were still trying to come up with use cases that work for people with MND. The question is how do we want to be represented? Cave says he has seen people advocate for a system where hyperrealistic avatars of a person with MND are displayed on a screen in front of the persons real face. I would question that right from the start, he says. Both Rodriguez and Esser can see how avatars might help people with MND communicate. Facial expressions are a massive part of communication, so the idea of an avatar sounds like a good idea, says Esser. But not one that covers the users face you still need to be able to look into their eyes and their souls. The Scott-Morgan Foundation will continue to work with technology companies to develop more communication tools for people who need them, says Roberts. And ElevenLabs plans to partner with other organizations that work with people with speech difficulties so that more of them can access the technology. Our goal is to give the power of voice to 1 million people, says Noel. It really does change the game for us, says Fernandez. It doesnt take away most of the things we are dealing with, but it really enhances the connection we can have together as a family.
0 التعليقات ·0 المشاركات ·35 مشاهدة