www.techspot.com
In context: Some of the implications of today's AI models are startling enough without adding a hyperrealistic human voice to them. We have seen several impressive examples over the last 10 years, but they seem to fall silent until a new one emerges. Enter Miles and Maya from Sesame AI, a company co-founded by former CEO and co-founder of Oculus, Brendan Iribe. Researchers at Sesame AI have launched a new Conversational Speech Model (CSM). This advanced voice AI has phenomenal human-like qualities that we have seen before from companies like Google (Duplex) and OpenAI (Omni). The demo showcases two AI voices named "Miles" (male) and "Maya" (female), and its realism has captivated some users. However, good luck trying the tech yourself. We tried and could only get to a message saying Sesame is trying to scale to capacity. For now, we'll have to settle for a nice 30-minute demo by the YouTube channel Creator Magic (below).Sesame's technology uses a multimodal approach that processes text and audio in a single model, enabling more natural speech synthesis. This method is similar to OpenAI's voice models, and the similarities are apparent. Despite its near-human quality in isolated tests, the system still struggles with conversational context, pacing, and flow areas Sesame acknowledges as limitations. Company co-founder Brendan Iribe admits the tech is "firmly in the valley," but he remains optimistic that improvements will close the gap.While groundbreaking, the technology has raised significant questions about its societal impact. Reactions to the tech have varied from amazed and excited to disturbed and concerned. The CSM creates dynamic, natural conversations by incorporating subtle imperfections, like breath sounds, chuckles, and occasional self-corrections. These subtleties add to the realism and could help the tech bridge the uncanny valley in future iterations.Users have praised the system for its expressiveness, often feeling like they're talking to a real person. Some even mentioned forming emotional connections. However, not everyone has reacted positively to the demo. PCWorld's Mark Hachman noted that the female version reminded him of an ex-girlfriend. The chatbot asked him questions as if trying to establish "intimacy" which made him extremely uncomfortable."That's not what I wanted, at all. Maya already had Kim's mannerisms down scarily well: the hesitations, lowering "her" voice when she confided in me, that sort of thing," Hachman related. "It wasn't exactly like [my ex], but close enough. I was so freaked out by talking to this AI that I had to leave." // Related StoriesMany people share Hachman's mixed emotions. The natural-sounding voices cause discomfort, which we have seen in similar efforts. After unveiling Duplex, public reaction was strong enough that Google felt it had to build guardrails that forced the AI to admit it was not human at the beginning of a conversation. We will continue seeing such reactions as AI technology becomes more personal and realistic. While we may trust publicly traded companies creating these types of assistants to create safeguards similar to what we saw with Duplex, we cannot say the same for potential bad actors creating scambots. Adversarial researchers claim they have already jailbroken Sesame's AI, programming it to lie, scheme, and even harm humans. The claims seem dubious, but you can judge for yourself (below).As with any powerful technology, the benefits come with risks. The ability to generate hyper-realistic voices could supercharge voice phishing scams, where criminals impersonate loved ones or authority figures. Scammers could exploit Sesame's technology to pull off elaborate social-engineering attacks, creating more effective scam campaigns. Even though Sesame's current demo doesn't clone voices, that technology is well advanced, too.Voice cloning has become so good that some people have already adopted secret phrases shared with family members for identity verification. The widespread concern is that distinguishing between humans and AI could become increasingly difficult as voice synthesis and large-language models evolve.Sesame's future open-source releases could make it easy for cybercriminals to bundle both technologies into a highly accessible and convincing scambot. Of course, that does not even consider its more legitamate implications on the labor market, especially in sectors like customer service and tech support.
0 Commentaires ·0 Parts ·62 Vue