Skip to main content

From Mood Playlists to Metadata: How Smart Speakers Are the Next Frontier — And Challenge — For the Music Business

As you read this, the mass adoption of voice-enabled devices and experiences around the world is accelerating -- with music taking center stage as a growth driver and guinea pig.

As you read this, the mass adoption of voice-enabled devices and experiences around the world is accelerating — with music taking center stage as a growth driver and guinea pig.

Overall smart-speaker sales more than tripled year-over-year in 2017, and 60 percent of smart-speaker owners regularly use their devices to play music, according to the latest Smart Audio Report from NPR and Edison Research (the next most popular use cases — answering “general questions” and getting info about the weather — trail far behind at 30 percent and 28 percent, respectively). Consulting firm FutureSource recently found that smart-speaker owners are four times more likely to pay for a music subscription than non-owners, drawing a direct line to economic gains for the more streaming-minded execs in the industry.

Even more compelling is the extent to which voice experiences are outpacing and cannibalizing distribution channels the music industry has long taken for granted. Today, Amazon Music — reportedly the third-largest music streaming service globally by subscriber volume — is seeing more consumption through Alexa-enabled devices than across both iOS and Android combined. 34 percent of respondents to the Smart Audio Report said that their time spent listening to music on their smart speakers was replacing analogous time spent on smartphones.

Such figures paint a stark contrast to the mobile-first growth narrative that the music business is used to telling, even this year. For instance, in Spotify’s SEC filing, the company refers to itself as a “mobile-first platform” and cites further penetration into existing markets as one of its core goals for the coming years, claiming that only 13 percent of payment-enabled smartphone users in Spotify’s active markets currently use the service.

Hence, music execs are investing heavily in voice not just because their financial future depends on it, but also because they learned the hard way from Napster about the benefits of preempting their own disruption.


In the tech world, it’s not just Amazon battling to dominate the voice-driven music listening experience: Spotify is testing a proprietary voice-control interface that may appear in its rumored hardware product line, while Sonos is focusing on collaborating more directly with artists and producers in marketing its own smart speakers. At SXSW, Bose premiered its open augmented-reality platform Bose AR and announced a whole slate of partnerships with premier audio, radio and podcast companies including TuneIn, RadioPublic, Anchor, Audioburst and Aaptiv.

Below are three reasons why the music industry is leaning further into voice this year than ever before — and why the journey will be as challenging as it is rewarding.

Voice-enabled music discovery is more frictionless on the front end, which supposedly helps everyone…

The last few technological disruptions in music history — which can be roughly bookmarked by Napster, iTunes and Spotify — all “unbundled” a traditional music format at the time. iTunes unbundled albums into individual tracks for sale; Napster did the same in a networked sharing environment that, needless to say, did not pay as well. Spotify unbundled music in a different fashion by “rebundling” isolated songs from otherwise disconnected artists and albums into the format of a playlist, which has now become the primary currency driving artists’ exposure and audience development in the digital age.

On Mar. 20, at a voice-focused industry event in London co-hosted by Music Ally, the British Phonographic Industry (BPI) and the U.K.-based Entertainment Retailers Association (ERA), the conversation pointed to voice as the next step in music-tech’s continuous unbundling cycle.

“What I see is the radio industry going through the iTunes moment,” Pete Downton, deputy CEO of 7digital, said at the event. “Suddenly you unbundle the experience, and allow consumers in a frictionless way to listen to what they want, when they want it.”


Indeed, according to FutureSource, 30 percent of smart-speaker owners are already using their devices to discover new music, and 78 percent of that cohort find new music on a daily basis. Recent data from Amazon Music also confirms that consumers are turning to voice as a more convenient music discovery option over screen-based channels. For instance, during the 2018 Winter Olympics, Amazon Music users in the U.S. queried songs used in the background of various competitions (especially figure skating) five times more often through voice than through simple tap/click requests on the music app or web browser.

Amazon argues that these trends will benefit not only mainstream and lean-back music offerings, but also niche genres and listeners who might not have otherwise been first movers into streaming.

“Whether it’s a lean-back experience or a specific artist or genre, it becomes much easier to find any type of content, and removing friction drives more music consumption across the board,” Ryan Redington, director of Amazon Music, tells Billboard. “For those artists who feel like they haven’t been benefiting as much from the streaming era yet, I think bringing music into the home helps everyone in the industry out.”

… but effective music delivery is still convoluted on the back end (read: metadata), which could potentially hurt smaller artists

A common obstacle for music-tech entrepreneurs is the contradiction between a seamless, inexpensive user experience and a costly, fragmented backend, especially when it comes to content licensing. Voice experiences present a similar conundrum, except the source of complexity is more in the metadata surrounding content delivery than it is in the licensing itself.

One example of this complexity is pronunciation: “There are so many artists with dollar signs in their names, or with different grammatical elements, who cannot be found within the current smart-speaker environment,” Kara Mukerjee, head of digital at RCA Label Group U.K., said during the Music Ally event in London.

As an example, Mukerjee listed two artist names that were easily distinguishable on paper — Billboard chart-topping electropop artist and Polydor Records-signed girl group M.O — but impossible to differentiate when spoken. In such scenarios, the voice assistant would likely have to ask the user for additional information, either around the spelling of the name or around titular or lyrical content (the latter of which is not yet a requirement for normal delivery to the major streaming platforms).


One pain point discussed during the Music Ally event in London was whether rights holders and platforms could arrive at a peaceful agreement about what voice metadata standards should look like under the hood. Amazon Music is currently taking a hybrid approach of working with and updating existing delivery standards — particularly with DDEX — while hiring in-house curators to tag individual songs with certain moods and contexts, which machine learning can then scale across larger catalogs.

Amazon also tells Billboard that it updates labels regularly on the user behavior they are seeing, and on subsequent opportunities to enhance and improve the metadata delivery process — e.g. suggesting new “critical metadata fields” around lyric queries, radio impact date and other relevant marketing terms.

“We had never thought about radio impact date as a critical metadata field in the past, but in voice-enabled environments, customers no longer have visual access to ‘Chart,’ ‘Browse’ or ‘New Release’ tabs and campaigns,” says Redington. “When customers ask for the ‘latest song by Justin Timberlake,’ we want to pull up the song that RCA is working on the radio circuit, without the end users needing to know the title of the song.”

Critics will likely call out this example for potentially putting smaller, more niche acts without radio budgets behind them at a disadvantage. In addition, new platforms always induce new aesthetics — and just as many songwriters are working to “make something that sounds like Spotify” to increase their chances of racking up playlist placements and streams, artists in the future may feel pressured to conform to an “SEO for a voice interface,” and to make “changes to artist names and track names purely because they have to respond” to a voice-first environment, said Mukerjee.

From a major-label perspective, however, the purpose of adding metadata like “radio impact date” is not necessarily to preserve commercial dominance, but rather to ensure consistency of the user experience across multiple devices and platforms, and to deliver the best answers to a small set of unique cases. “If Siri, Alexa and Google all respond with different songs [to the same request], it reflects poorly on both the labels and the tech,” a label source tells Billboard.

Redington also insists that “artists should be creative, and keep doing the work that they love. It’s incumbent on us at Amazon Music to solve the backend complications, and work with labels to make sure we have the right metadata, such that we serve up the right requests based on what customers want. We wouldn’t recommend any changes in the music itself.”

Voice experiences lead to new, hyper-personalized, context-driven behaviors around music consumption that we are only now starting to understand

As with any new technology, voice-enabled devices will engender new niches of human behavior and interaction that are nearly impossible to predict today, but early moves from Amazon and other market leaders lay out a potential blueprint for the future.

One such possible niche is hyper-personalized contextual playlisting. This functionality is not currently possible with Spotify’s signature mood playlists — such as “Good Vibes” and “Songs to Sing in the Car” — which serve the same tracks to all of the service’s 159 million users, regardless of those users’ individual music tastes.

Voice has the potential to make this type of contextual curation much more dynamic. “If you listen to a lot country music and ask Alexa for a dinner playlist, we’ll serve you a country-related answer to that request,” says Redington. “The more we know about the customer, the smarter we can be with those layers of personalization. When you get both the context and the genre preferences right, that’s a really exciting opportunity.”

Another possible behavioral change is more conversation-based discovery — and not just when the metadata on the backend falls short, as previously discussed. While mass audiences have yet to adapt meaningfully to this conversational listening behavior, the idea already sounds attractive to music marketers challenged to craft platform-native stories around artists that are interesting enough to keep both new and existing fans engaged.

One example of a voice-driven music campaign is “Paloma’s Bedtime,” an Alexa Skill that RCA U.K. shipped in Jan. 2018 that offers lullabies and bedtime stories alongside a capella tracks from Paloma Faith’s latest album The Architect. While the skill has only a handful of reviews to date, Mukerjee reinforced how “it was very cheap to build, [with] really short development times. We turned the whole build around in about three weeks.”

Amazon Music Unlimited has also worked with the likes of U2 and OneRepublic on the “Side-by-Side” Alexa Skill, which syncs artists’ exclusive commentary with selected tracks from their catalog — similar to how Spotify weaves together music and artist interviews for its Secret Genius playlists, albeit only on the visual, on-screen platform for now.

Beyond the home, the in-car music experience is still up for grabs, and competition is heating up. Spotify recently announced an integration with Cadillac’s in-car entertainment system, but has yet to enable voice control in any of its auto integrations. Amazon is taking an open-source approach to scaling voice experiences across its auto partnerships, recently signing on as a member of the Automotive Grade Linux platform. Shazam competitor SoundHound has already integrated its proprietary voice platform Houndify with Hyundai’s in-car “Intelligent Personal Agent” and with infotainment platform Nvidia Drive.

“We know from users of our SoundHound music search and discovery app that the car is the second most common place where people discover new songs, with the home being the first,” Katie McMahon, vp & gm of SoundHound, tells Billboard. “Adding voice search capabilities in-car opens up a world of new, and safer, use cases for users and drivers.”

Finally, both smart-speaker manufacturers and music marketers are betting on music consumption becoming more communal and social with the adoption of voice. This would perhaps be an ironic throwback to the early 20th century — when families gathered around a living-room radio or television for the latest entertainment — but it would also be a real opportunity for streaming services to nail a coherent social strategy after repeated failures over the last several years.

“Music is a social experience, and at Sonos we’ll be focused on keeping it as social as we can,” Mieko Kusano, senior director of product management, voice at Sonos, said at a press conference in Oct. 2017. “The home of the future will be much like the home of the past.”