Google’s DeepMind AI Conveys Human-Like Speech

It’s just a matter of time before you won’t have the ability to tell the difference between speaking with a human or speaking with a robot. Or, perhaps that time is now. You see, Google’s DeepMind team recently announced a new AI called WaveNet. They are the same group that created AlphaGo, which defeated one of the world’s best Go players.

So, the WaveNet group fed the neural network raw audio waveforms recorded from real human speakers. Currently, text-to-speech (TTS) systems utilize a system called concatenative TTS, where the audio is generated by recombining fragments of recorded speech. On the other hand, DeepMind has around 16,000 samples per second.

In addition, WaveNet is a “neural network” that is trained on real waveforms. It then uses statistics to choose which samples of that audio to use when “speaking,” piece by piece. In a recent post, DeepMind’s researchers wrote:

Building up samples one step at a time like this is computationally expensive, but we have found it essential for generating complex, realistic-sounding audio.

In a blind test with human subjects, DeepMind states that WaveNet’s audio is around 50% closer to real human speech.



Written by Katrina Manning

Katrina Manning is the Editor In Chief for . In addition, she is the author of "Marmalade's Exciting Tail, Lupus Obscurus and Under the Monastery. Her writing and editing services have been in demand over the last seven years, and she has contributed to a variety of websites and publications. She enjoys covering tech, business and lifestyle. Her objective is to provide a newsworthy, informative and enjoyable read.