Microsoft AI tool can mimic voice from 3-second audio [Video]
WHAT YOU NEED TO KNOW:
- Microsoft researchers showcased Vall-E, a new text-to-speech AI tool that can mimic any voice using a three-second sample recording.
- The “neural codec language model” not only mimics the speaker’s voice but also their tone and emotions.
- Researchers warned that there should be a protocol ensuring there is consent to use a speaker’s voice.
Vall-E, a new text-to-speech AI tool by Microsoft, has the ability to mimic any voice using a three-second sample recording.
The “neural codec language model” was trained on 60,000 hours of English speech data.
According to the paper released by researchers out of Cornell University, the tool not only mimics the speaker’s voice but also their tone and emotions.
Audio samples shared on GitHub revealed the eerie similarities between the original speaker’s recording and Vall-E’s recreation, even when the original speaker never actually said Vall-E’s text prompt.
The study authors wrote, “Experiment results show that Vall-E significantly outperforms the state-of-the-art zero-shot [text to speech] system in terms of speech naturalness and speaker similarity. In addition, we find Vall-E could preserve the speaker’s emotion and acoustic environment.”
While the audio samples vary in quality, with some still sounding robotic in nature, most of the audio samples eerily captured the original voices.
Critics of AI argued against the potential risks in text-to-speech AI, which the researchers themselves acknowledged.
They wrote, “Since Vall-E could synthesize speech that maintains speaker identity, it may carry potential risks in misuse of the model, such as spoofing voice identification or impersonating a specific speaker.”
The researchers clarified that the voice experiments were conducted “under the assumption that the user agrees to be the target speaker in speech synthesis,” since most were taken from the Emotional Voices Database.
Vall-E is not yet available to the public.
The researchers warned that if the model is to ever be released to the general public, it “should include a protocol to ensure that the speaker approves the use of their voice and a synthesized speech detection model.”
Source: Fox News
January 16, 2023 at 6:45 pm
Don’t let it hear biden!
January 16, 2023 at 7:13 pm
January 17, 2023 at 8:01 pm
You may be assimilated by the FBI confessing, in your voice using new AI software, to many acts of crime, sedition and treason. To avoid this Never Speak Again and converse only in recyclable clay tablets. Leave no finger prints, hair, dandruff and blood anywhere to avoid DNA SWATING. A thick body coating of wet mud ala THE PREDATOR will help too.
January 16, 2023 at 8:27 pm
Are these idiots serious? There’s a lot more than just potential for abuse with this thing. It most certainly WILL be used for evil purposes, probably by our own government among others. Modern technology has twisted reality to the point where it’s impossible to determine what’s real and what isn’t. This is science that’s working AGAINST the human race, and yet no one seems to care. All kinds of praise is heaped upon the people who develop this stuff, as if the end results will be the key to a better world for us all. The more I hear of these technological “advancements”, the more I realize how treacherous these times really are. Those who blindly choose to follow “the science” will follow it to their graves. Unfortunately, they’ll likely drag the rest of us there as well.
January 16, 2023 at 11:20 pm
Recon everyone is going to prison after angering a savvy enemy be it law enforcement or a simple barroom discussion
January 17, 2023 at 12:07 am
Yeah like that’s not gonna be used by the criminals in their phone frauds convincing family members to send them money because they broke down or lost their wallet or whatever scam they are using at this time…
January 17, 2023 at 3:53 pm
I don’t see what the value of this is, but I’m not interested in robots. There must be a reason Microsoft is experimenting with this. From what I’ve read it isn’t making waves, just worries.