Science & Tech

Microsoft AI tool can mimic voice from 3-second audio [Video]

Published

3 years ago

January 13, 2023

Lorena

voice recording — Photo Credit: CoWomen/Unsplash (CC0 Public Domain)

WHAT YOU NEED TO KNOW:

Microsoft researchers showcased Vall-E, a new text-to-speech AI tool that can mimic any voice using a three-second sample recording.
The “neural codec language model” not only mimics the speaker’s voice but also their tone and emotions.
Researchers warned that there should be a protocol ensuring there is consent to use a speaker’s voice.

Vall-E, a new text-to-speech AI tool by Microsoft, has the ability to mimic any voice using a three-second sample recording.

The “neural codec language model” was trained on 60,000 hours of English speech data.

According to the paper released by researchers out of Cornell University, the tool not only mimics the speaker’s voice but also their tone and emotions.

Audio samples shared on GitHub revealed the eerie similarities between the original speaker’s recording and Vall-E’s recreation, even when the original speaker never actually said Vall-E’s text prompt.

Surprised there isn't more chatter around VALL-E

This new model by @Microsoft can generate speech in any voice after only hearing a 3s sample of that voice 🤯

Demo → https://t.co/GgFO6kWKha pic.twitter.com/JY88vf4lYc
— Steven Tey (@steventey) January 9, 2023

The study authors wrote, “Experiment results show that Vall-E significantly outperforms the state-of-the-art zero-shot [text to speech] system in terms of speech naturalness and speaker similarity. In addition, we find Vall-E could preserve the speaker’s emotion and acoustic environment.”

While the audio samples vary in quality, with some still sounding robotic in nature, most of the audio samples eerily captured the original voices.

Here are comparisons between the 3 second speaker audio, the ground truth, baseline, and VALL-E generated audio.

Paper website: https://t.co/bBSKenv00w pic.twitter.com/ScXyjzDAf7
— bleedingedge.ai (@bleedingedgeai) January 11, 2023

Critics of AI argued against the potential risks in text-to-speech AI, which the researchers themselves acknowledged.

They wrote, “Since Vall-E could synthesize speech that maintains speaker identity, it may carry potential risks in misuse of the model, such as spoofing voice identification or impersonating a specific speaker.”

The researchers clarified that the voice experiments were conducted “under the assumption that the user agrees to be the target speaker in speech synthesis,” since most were taken from the Emotional Voices Database.

Vall-E is not yet available to the public.

The researchers warned that if the model is to ever be released to the general public, it “should include a protocol to ensure that the speaker approves the use of their voice and a synthesized speech detection model.”

Source: Fox News

Related Topics:Microsoft AI voice simulator

8 Comments

CPO Bill

January 16, 2023 at 6:45 pm

Don’t let it hear biden!

Reply
Guest

January 16, 2023 at 7:13 pm

Who cares?

Reply
- CharlieSeattle
  
  January 17, 2023 at 8:01 pm
  
  You may be assimilated by the FBI confessing, in your voice using new AI software, to many acts of crime, sedition and treason. To avoid this Never Speak Again and converse only in recyclable clay tablets. Leave no finger prints, hair, dandruff and blood anywhere to avoid DNA SWATING. A thick body coating of wet mud ala THE PREDATOR will help too.
  
  Reply
RobertC

January 16, 2023 at 8:27 pm

Are these idiots serious? There’s a lot more than just potential for abuse with this thing. It most certainly WILL be used for evil purposes, probably by our own government among others. Modern technology has twisted reality to the point where it’s impossible to determine what’s real and what isn’t. This is science that’s working AGAINST the human race, and yet no one seems to care. All kinds of praise is heaped upon the people who develop this stuff, as if the end results will be the key to a better world for us all. The more I hear of these technological “advancements”, the more I realize how treacherous these times really are. Those who blindly choose to follow “the science” will follow it to their graves. Unfortunately, they’ll likely drag the rest of us there as well.

Reply
Manuel

January 16, 2023 at 11:20 pm

Recon everyone is going to prison after angering a savvy enemy be it law enforcement or a simple barroom discussion

Reply
PJ413

January 17, 2023 at 12:07 am

Yeah like that’s not gonna be used by the criminals in their phone frauds convincing family members to send them money because they broke down or lost their wallet or whatever scam they are using at this time…

Reply
Mary Geiger

January 17, 2023 at 3:53 pm

I don’t see what the value of this is, but I’m not interested in robots. There must be a reason Microsoft is experimenting with this. From what I’ve read it isn’t making waves, just worries.

Reply
robert

August 17, 2023 at 7:23 pm

AI NEEDS TO BE TERMINATED FOREVER.

Reply

Crystal Clear News

Microsoft AI tool can mimic voice from 3-second audio [Video]

Science & Tech

Microsoft AI tool can mimic voice from 3-second audio [Video]

WHAT YOU NEED TO KNOW:

8 Comments

Leave a Reply
Cancel reply

Leave a Reply

Crystal Clear News

Microsoft AI tool can mimic voice from 3-second audio [Video]

WHAT YOU NEED TO KNOW:

8 Comments

Leave a Reply Cancel reply

Leave a Reply

Leave a Reply
Cancel reply