Microsoft AI can now quickly clone voices that sound just like real people
Microsoft has developed VALL-E 2, an AI tool that nearly perfectly mimics human speech, but the public won’t have access to it due to abuse concerns.
Microsoft has developed an artificial intelligence tool that can accurately mimic human speech. Because it is so convincing, the tech giant won’t share it with the public, citing “potential abuse risks.”
VALL-E 2 is the name of the tool that turns text into speech. It can speak like a person with only a few seconds of audio. “Zero-shot learning” teaches it to recognize ideas without first showing examples.
The tech giant says VALL-E 2 is the first robot of its kind to reach “human parity,” which means it meets or beats standards for how much it looks like a person. It follows the announcement of the first VALL-E system in January 2023.
The people who work on VALL-E 2 say that it can produce “accurate, natural speech in the exact voice of the original speaker, comparable to human performance.” It can combine long and short phrases, as well as simple sentences.
Two features, group code modeling, and repetition-aware sampling, help achieve this. Be aware of repetition. Sampling fixes the problems that arise from repeatedly using the same tokens. Tokens are the smallest pieces of data that a language model can handle. In this case, they are words or parts of words.
It prevents sounds or phrases from repeating during the decoding process. This helps the system’s speech sound more natural by changing the sounds and phrases it uses.
The model can only handle a certain number of tokens at once with grouped code modeling, which helps it work faster. The researchers put VALL-E 2 up against audio samples from two English-language databases, LibriSpeech and VCTK.
To see how well VALL-E did with more difficult tasks, they also used ELLA-V, a framework for testing zero-shot text-to-speech synthesis. According to a paper summarizing the results published on June 17, the system ultimately prevailed “in speech robustness, naturalness, and speaker similarity.”
VALL-E 2 will not be available to the public any time soon, according to Microsoft, because it is “purely a research project.” The company said on its website, “Right now, we have no plans to add VALL-E 2 to a product or make it easier for more people to get it.”
Using the model incorrectly, such as for voice recognition or speaker impersonation, could pose a risk. The tech giant claims that users can report suspected tool abuse using an online form.
Microsoft has good reason to be concerned. This year alone, cybersecurity experts have seen a huge rise in bad people’s use of AI tools. These tools include those that copy speech. The words “voice” and “phishing” combine to form the word “vishing.” This is a type of attack in which con artists call and claim to be friends, family, or other trustworthy individuals.
If done badly, voice spoofing could even be dangerous to national security. A robocall using President Joe Biden’s voice warned Democrats not to vote in the New Hampshire primaries in January. Voter fraud and impersonating a candidate were the charges later brought against the scheme’s planner.
Microsoft Faces Scrutiny Over AI Usage
Microsoft is getting more attention for how it uses AI, both in terms of antitrust law and data privacy. Regulators are worried about how the tech giant’s $13 billion deal with OpenAI gives it too much power over the startup.
Users have also said negative things about the company. Remember how last month they pushed back the release date indefinitely for an “AI assistant” that takes screenshots of a device every few seconds?
Individuals and experts in data privacy, including the Information Commissioner’s Office in the UK, expressed harsh criticism towards Microsoft.
Someone from the company told The U.S. Sun that Recall would change “from a preview experience broadly available for Copilot+ PCs to a preview available first in the Windows Insider Programme.”