In recent years, the listening time required for a piece of artificial intelligence to clone someone’s voice has become shorter and shorter.
Once it was minutes, now it’s just seconds.
OpenAI, the Microsoft-backed company behind the Viral Generative AI chatbot ChatGPT, recently revealed that its own voice cloning technology requires just 15 seconds of audio material to reproduce someone’s voice.
In a post on its website, OpenAI shared a small-scale preview of a model called Voice Engine, which it has been developing since late 2022.
The Voice Engine works by feeding it at least 15 seconds of spoken material. The user can then input text to generate what OpenAI describes as “emotional and realistic” speech that “closely resembles the original speaker.”
OpenAI insists it is taking “a cautious and informed approach to a wider release due to the potential for synthetic voice misuse,” adding that it wants to “start a dialogue about the responsible development of synthetic voices and how society can adapt to these new capabilities .
He added: “Based on these conversations and the results of these small-scale tests, we will make a more informed decision about whether and how to develop this technology at scale.
One of the abuses that OpenAI refers to is a scam that some criminals are already running using similar technology that has been publicly available for some time. It involves cloning a voice and then calling a friend or relative of that person to trick them into handing over cash via wire transfer. There are also fears about how such technology might be used in the upcoming presidential election, an issue highlighted by a recent high-profile incident in which a robocall using a clone of President Joe Biden’s voice he told people not to vote in the January primary in New Hampshire.
Another concern is how the rapidly improving technology will affect the livelihoods of voice actors who fear that they will increasingly be asked to sign over the rights to their voices so that artificial intelligence can be used to create a synthetic version. with compensation for such a contract likely to be much lower than if the actor were asked to perform the work in person.
Looking at the technology’s most positive developments, OpenAI suggests that it could be used to provide reading assistance to non-readers and children using natural, emotional voices that “represent a wider range of speakers than is possible with predefined voices”. as instant translation of videos and podcasts, something Spotify is already testing.
It could also be used to help patients who are gradually losing their voice due to illness to continue to communicate using what sounds like their own voice.
OpenAI has a few examples of AI-generated audio and reference audio on their site, and we’re sure you’ll agree that they’re pretty unusual.
Editors’ recommendations