Twenty Thousand Hertz, a podcast created by Dallas Taylor, uncovers the backstory behind some of the most recognizable (and fascinating) sounds in the world with special guests featured in most episodes.
“We live in a universe governed by rational laws that, through science, we can discover and understand.” – Stephen Hawking
I |
f it hasn’t been said already, let
us remember it once more. The world around us has changed drastically in ways that are (paradoxically enough) fathomable and unfathomable.
In this ever-changing world, the one constant is how fast technology is changing. Can we, mere mortals, who do not belong to big tech, even understand the larger ramifications? But where tech is rapidly changing, the backstory is not always known.
For example, take Siri, a virtual assistant app introduced by Apple products. Apart from inspiring other tech companies to follow suit, the voice behind its virtual assistant, Siri, has been heard around the world but where did it come from?
The question has value because Siri makes us curious since it was the first (but certainly not the last) in the field and this is where Twenty Thousand Hertz podcast is so informative and entertaining.
Hosted by Dallas Taylor, the Voice of Siri is the first episode in this podcast that has more than 100 episodes since beginning in 2016. You can take baby steps and begin from the first episode where you will discover answers to questions you didn’t know you had somewhere in the corner of your mind.
Voice of Siri, written and produced by Miellyn Fitzwater Barrows, is approximately 12 minutes in length but it tells you what you need to know without an hour-long meandering tale.
So, what do we know? It is always better to go to the source, which in this case are the episodes but to give you an idea, we should begin with the first episode, which divulges how a voice actor called Susan Bennett read strange phrases for an untitled project in 2005, years before Siri came into existence.
Bennett learned 6 years later about how the unnamed project for which she had spent hours in the voice booth reading strange phrases had led to the voice of Siri.
In other words, her voice was being heard around the world where young kids and even adults use it or ask amusing questions (as I do) and get odd answers.
The episode also features Dr. Andrew Breen, Director of Speech-to-Text Technology company called Nuance that had supposedly worked on Siri in the beginning.
In the 12 minutes episode, Dr. Breen explained the Siri model.
He noted, “In principle, it’s very simple. Just record a phrase, then extract the individual sounds. We’ll do that laboriously for several thousands of phrases.
We then go and search in the database and pull out those sounds and then stick them together using very basic simple processing to smooth out the joints.”
Siri, therefore, was just the start. According to Dr. Breen, giving a synthetic voice some sort of expression is on the table.
“The nuances of emotions that we are able to present to somebody on a phone is incredible — a pause of the right duration on a phone line and you’ll get the message that I’m not happy or a subtle expression in my voice will give you an indication of the meaning and emotion that’s behind it. We want to move away from recordings and move to the generation of sound. That’s where we want to be.”
They may get it right, or wrong but now you know it was a voice artist whose voice and the strange phases she read led to the original Siri voice. And if you feel it sounds somewhat automated, almost mechanical, worry not because a change will come.
I wonder, though, if we will miss the Siri voice we’ve heard since 2011 when it was first introduced. Or prefer the less mechanical “generation of sound” idea applied to it that might make it sound less mechanical? Can this idea be refined and fool us as one that sounds completely human? Tech companies will certainly tell.
To learn more about the global map of changing sounds, find an episode of Twenty Thousand Hertz. You will not be disappointed.
– Find the series online or on a phonetic device or a computer near you.