The Echo Nest's Paul Lamere on the Dangers and Solutions to Streaming Discovery (Q&A)

Paul Lamere, director of Developer Platform at The Echo Nest, a Spotify subsidiary

Courtesy of Paul Lamere

It's become fairly obvious to anyone paying half-attention that music streaming services are the future of consumption for the average listener, offering fans tens of millions of songs practically wherever they go. Because of the number of people using them and the amount of content available within, these services rely heavily on data processing, via finely tuned algorithms, to process what their users like and what they may want. These formulae, however, aren't intuitive, and present problems for people like Paul Lamere, a data scientist at The Echo Nest (recently acquired by Spotify, you may remember) whose job it is to give listeners exactly what they want, even if they've never heard of the artist.

Imagine a square that contains all the music on Spotify. What Lamere and his colleagues attempt to do is round off the edges of that square in order to focus in on what each user may like, but an unfortunate side effect of this pruning is that some of the music in the corners get pruned. Much of that music is niche, maybe too out-there for some, but maybe exactly the tonic for others. We spoke to Lamere about the issue of those lost corners, and how he and his team try to keep them there.

Billboard: I wanted to speak to you ever since your talk at SXSW, where you discussed how you approach and process data. In it you briefly mentioned some of the resulting homogenization, or a rounding off of the edges, that occurs as a result of it. Can you walk me through that process?
Paul Lamere: I think at SXSW I was talking a lot about how to deal with users that we know very little about, and how we could we reduce some of the on-boarding issues these people have by restricting the music to, for a lack of a better word, the 'least offensive' or 'low risk' music. If you don't know the gender of somebody, you don't play One Direction, since they skew heavily female. For 50% of people that's likely to be a bad choice. So then the question becomes: 'Doesn't this turn into a McDonald's of music?'

There's a lot of approaches here. After a listener is on a music service for a little bit of time, we start to know their taste a lot better. If they're always listening to The Black Keys and The White Stripes versus T.I. and Kanye West or Katy Perry and Kesha, we get a sense of what they like. And there's a lot of signals that come almost right away. If they're listening to a lot of longer-tail artists, like The Decemberists, you probably wouldn't recommend The Beatles, because you'd assume they're aware.

So some of the things about homogenization -- it's a real problem with these collaborative filtering systems, these horrible feedback loops. Over the years, lots of music services end up gravitating around certain artists because of these feedback loops. One of the things is that in some ways homogenization -- though that's not the best term...

What would be a better term?
I guess it's okay, but I think it has some negative connotations. But the point is that it's not always a bad thing. When people listen to music, they don't always like to listen to new music. There's a strong bias toward listening to familiar music, especially in certain contexts, like exercising. It may not be a time for a lot of music discovery. Similarly, if you have a social circle -- like everyone might be talking a TV show the next morning -- it's good to have some similarity across listening, so we can have our shared music tastes. So that's one way why I think having some homogenization, for lack of a better term, isn't a bad thing.

Another thing too is that human curators are a great source for new music, in bringing totally new perspectives in. I think most music services have a pretty strong human curator aspect as well.

How useful are users' playlists in avoiding losing artists to the rounded edges, if that makes sense?
They're extremely important. There's some other folks on the Spotify analytics team who look pretty hard at this stuff. There are some playlists where the curators are leading-edge indicators for popularity. For instance, before Lorde broke out she appeared in a few Spotify playlists that were put together by these leading-edge tastemakers, and those are critical for her getting picked up by the more mainstream vehicles that drove her into the stratosphere.

I understand that you wouldn't want to serve up a One Direction fan The Jesus Lizard, but I'm wondering about the difficulties in exposing new artists. It's a problem that gets hinted at a lot.
That's why we don't rely on any one approach. Talking about the new listener we know nothing about, I think the rounded corners are okay, but we don't want to lose the artists that are looking to break out. If an artist can get people to listen to them -- by touring, getting reviewed -- there are lots of ways into the system.

A human curator may find out about them, or a group of fans may start to listen to them in a small region, like Memphis or Boston or something, that would be noticed and serviced. On Spotify you can listen to artists who are popular nearby, so even an artist who doesn't have a global reach can still find fans. A band that's on tour may become popular in a town as their date approaches, and that helps them get into the system. So between human curators and all the data we have about who's listening, I think there's more ways [these days] for artists to find their way into someone's regular rotation.

Back in the '70s and '80s you had these gatekeepers, the DJs, the music producers. If you didn't have an entry into those you had little other opportunity. But now the artist can self-promote online and through tours, and if they can start to get fans and get noticed by curators, they'll start to get listens.

I suppose the hermetic scene is in its twilight.
Let me point out a few other tools that we have. One of the things that we look at a lot to surface up this activity is regular play data; who's playing what. One of the situations that you can get into these feedback loops from is, if you start recommending music, and people play that music, and you look at those plays to generate those recommendations, you start to get these nasty feedback loops. You're essentially eating your tail. Whenever anybody plays a song, we also note how they got to that song, through a recommendation, playlist, shared on social media. We can track how fans find music, and that tells us a whole lot of things. If they get to a song through a recommendation, then we may not want to use that data to drive other recommendations. Also if we see that lots of fans are finding new music through certain playlists, then we can start surface those playlists as good discovery playlists, and give the authors of those playlists wider visibility.

Have you run into any concern or questions on data over the last year, since Snowden's bombshells?
It's on everybody's mind, how to help people find music without being creepy about it. There's a 'creepy line' that you have to stay on the right side of. If your music player is asking you where you are and who your friends and what's on your calendar, you've probably crossed the creepy line. It's funny though, we've had ten or twelve years with -- among the more engaged music listener, they're used to sharing their music data. I think there's some acceptance, at least for the more engaged listeners. But it's certainly something we're very cognizant about. Especially because we're big in Europe and they have very, very strong data privacy laws.


One thing that stuck out recently was Pandora touting how they could determine your political affiliations.
It's funny, my experience has been that if you look at data and make aggregate predictions you're 'safe.' Like: 'People who listen to death metal tend to be Republican or Democrat.'

Versus 'Debra is a republican.'
Or for that matter, 'Chris is female.' If you remember 15 years ago, the Wall Street Journal ran an article "My TiVo Thinks I'm Gay." If your music player is starting to predict your lifestyle choices, you may be crossing that line. We think a lot about that line. It's going to be different for different people, but we always stay on the right side of it, by default.

Probably way over the creepy line, but what about artificial intelligence? A SkyNet for music? What would that look like?
Are we building the SkyNet for music, is that your question?

That's half my question. [Laughs]
There have been a lot of folks out there who have pushed out these 'hit predictors.' One of the more famous ones was Hit Song Science. My personal opinion is that it's all total snake oil. There's so many things about music that are hard to predict; a song is on a TV show, or got tweeted by another artists. So many things that are unpredictable. Trying to say that you can have something that predicts hits, that sort of thing, is wrong.

But when you have music services like Spotify or iTunes or Pandora, where you can see a huge swath of how people are listening to music, you get a pretty interesting perspective about what's popular, how people are using music. You may have seen, I did a blog post on the skipping behavior in music? Just by looking at that kind of data, you start to realize people are skipping songs not because they don't like the songs, but because that's how it's easiest to browse through music. Having access to all that data starts to give us a better understanding of how people experience music, and hopefully you can then use that data to improve the listening experience.