Voice Tech, Long in Development, Is Coming Together All at Once
New competitors to Apple's Siri and Amazon's Alexa are pushing voice technology forward — but what does that mean for music?
It's the cynical view of the American dream: the ability to control everything in a home without ever getting up from the recliner -- or even, God forbid, lifting a finger. This is the practical commercial terminus of voice recognition technology, and 10 years worth of research, testing and innovation from a slew of major tech conglomerates are all paying off pretty much simultaneously as the goal posts shift from giving commands to having full, contextual conversations with our... well hopefully, for tech companies, our every item, from car to phone to reading light.
The past few weeks in particular have increased voice technology's visibility among consumers and raised new possibilities about how far it can go. "We want users to have an ongoing, two-way dialogue," Google CEO Sundar Pichai said yesterday during the company's keynote presentation at its I/O developer's conference, as he introduced Google's new Assistant. "We think of the Assistant as an ambient experience that extends across all devices." Everywhere.
Five years ago Apple gave us Siri, at release more a game to get her to say something ridiculous than an integral tool for productivity and commerce. Amazon's Echo introduced Alexa last year, which broadened the scope of functionality beyond a web search. Spike Jonze's Oscar-winning film Her, starring Scarlett Johansson and Joaquin Phoenix, took the concept to its logical and final destination. (Not to mention a scene of Magic Leap-ish augmented reality gaming.)
But updates from Apple and Amazon, as well as an influx of new players in the space (including Google's Assistant, SoundHound's Houndify platform and former Siri programmers' brand-new Viv) have foisted voice recognition tech on us in a sweeping tide, rushing us into a fundamental change around how we interact, and consider, every day technology.
Much of that change is happening within the home. Already, Siri is compatible with Apple TV, for instance, and the company has been adding features to make it more compatible with Apple Music, including the ability to shuffle or add tracks to playlists or skip songs during playback. Amazon unbundled Alexa from its Echo speaker last year, opening up its API to developers to incorporate it. And Google's Assistant -- also available for third-party integrations -- will live within Google Home, a wifi speaker with streaming capability similar to the Echo scheduled to make its debut later this year.
But the biggest innovation is happening in the shift away from voice recognition and towards voice understanding, removing the middleman and not only minimizing the amount of time it takes from the point of speech to the point of action, but allowing for more complex "conversation" rather than rote command. Instead of "Play my rainy day playlist," for example, this technology can, or will, customize preferences within playlists -- "but not the really slow ones" -- or find a song or artist based on mood.
"Because we're able to understand complex conversations, users can be a lot more specific," says SoundHound founder/CEO Keyvan Mohajer, whose Houndify platform (a licensable API) has drawn dozens of integrations, from Uber to Expedia. "We can make it easier for people to get exactly what they want instead of being at the mercy of A/B recommendation engines... We expect it will heavily increase music consumption." Like everyone else, Mohajer spent 10 years developing Houndify -- operating SoundHound as something like a revenue-generating smokescreen to hide his true goal -- and hit a particularly useful innovation during that time. Instead of our devices taking our speech, converting it into text, scanning that text for keywords, and then reporting its results back to us, Houndify skips the middle part, going from our voice to the computer's analysis. The outcome speeds up results significantly, and allows the platform to more intelligent in its translations of our intent.
During a demonstration, Mohajer asked Houndify a dizzying question that went something like "show me a hotel in San Francisco between such-and-such date for less than this amount and more than this amount that allows pets and has a gym and is no more than x blocks away from the convention center." Houndify gamely complied. "Book it."
Another technology with this level of understanding is Viv, founded by the original creators of Siri, which was introduced at TechCrunch earlier this month. Its technology operates on a similar plane of sophistication as Houndify's, and emphasizes personalization and its open platform. According to the Washington Post, Google and Facebook have both approached the company to purchase Viv.
Other companies are getting involved in the space. Sonos, which is already compatible with every major streaming service, recently announced an abrupt shift towards prioritizing voice recognition, which involved a round of layoffs in March. At the time it was unclear whether that meant developing its own voice technology or licensing from another company; a rep confirmed to Billboard the latter approach is more likely now.
If the relatively prehistoric Bob Lefsetz is singing its praises -- in a column about voice technology: "Boomers think music is made on guitars, millennials think it’s made on laptops" -- then you can bet this is a tool which can adapt to any sort of use-case, whether coding wunderkind or Eagles-blasting Rip Van Winkle.
Conversational interaction with the things around us will all but zero out the friction between humans and technology. It's sort of a reset button, really. (At least, once people actually feel comfortable using it -- remember when it was verboten to pull an iPhone out at dinner?). We won't have to remember where we put that playlist inside Spotify, or which folder that app was in. Viv or Hound or Siri or Alexa or Assistant will. This is no small whoop. Ceding our physical engagement with technology won't make anyone very sad -- and it leaves a lot more room to think.