Major Labels

Big Data: The New Oil, or the New Snake Oil?

OilGuest post by Will Mills (@will_mills_) for, a music and tech think tank. Mills is the Director of Music and Content at Shazam

Big data played a pivotal role in this year’s U.S. election, with both candidates attempting to harness the power of information to win. Barack Obama’s campaign in particular mined deep metrics of the electorate, with a large team of data scientists pouring through polls and trends.

These teams sliced and diced the information, developing their own algorithms to see what the numbers told them and how they could best focus their fundraising efforts, mobilize voters, and maximize their marketing dollars. In fact, blogger Nate Silver received almost as much media attention for his predictions based on his own algorithms of the polls as the candidates themselves. In the end, Obama remained president and Silver became recognized as a genius.

For some time, people have talked about “Big Data” as if it is the be-all and end-all of business that will transform every industry. There is an element of truth to that thinking; however, the key is knowing what data is vital to the success of your business, and where to get it. Otherwise, the promise of demographics and other important marketing metrics will fall far short of being realized.

The technology and startup space is bustling in this area, with funded players like Next Big Sound, MusicMetric, Buzzdeck, and others aggregating many different data points and sorting them in new ways to see what additional knowledge they can gain about consumers and the evolving music marketplace. Sometimes dubbed, rather wearily, “fanalytics,” this space is growing to cater to the other emerging market trend of direct-to-fan, where we see new and even established artists cutting out the middleman and marketing and selling directly to their audience.

Last year, the concert promoter Live Nation acquired one of the earliest players in this music data space, BigChampagne, which initially focused on traffic from the nascent P2P networks. This move demonstrated that music companies are now making an investment in data.

Deep data analysis is becoming more critical when deciding how to generate profit. When you look at the live music industry — which is still the most traditional part of the music market — tour advance fees and physical sales such as merchandise and bar revenues are the metrics that matter. For example, this year Songkick launched its new Detour feature to leverage data to help acts plan shows or even tours. Detour uses a similar model to Kickstarter for live shows, including crowd sourcing demand and investment for gigs. One of its debut successes includes hosting a Hot Chip show at a UK town where the band had never previously played, and even helping plan a whole tour for the indie artist Andrew Bird.

But data isn’t just being used for tour planning. Weeks ago, Lady Gaga’s manager, Troy Carter, revealed at Wired 2012 how they can discover what her most popular songs are in a particular region (via Spotify play data), to work out how to tailor her live set lists for specific shows. The major labels have also moved on from the days of mailer cards in CDs, with focused teams and efforts around data and CRM, with their millions of fans and customers worldwide.

The Million Song Data Set is an established public domain assembly of multiple metadata fields covering one million contemporary songs. It was put together by the Echo Nest in conjunction with Columbia University’s LabROSA — the snappily titled “Laboratory for the Recognition and Organization of Speech and Audio” — and served as the basis of many Music Hack Day’s creations and mash-ups, and provided numerous insights in to the relations between songs, genres, user habits, and sales.

EMI’s million-interview dataset takes a qualitative approach to music data with the results of millions of interviews conducted globally with music fans on attitudes, behavior and appreciation around music, which was also the focus for a recent hack day on data.

At Shazam, where I have worked for the past eight years, we’ve seen how our own data can reliably identify hit records months, or even years, before the tracks are signed to larger labels. One of the reasons is a very large user base of more than 250 million people with Shazam on their phones, recognizing over three billion songs annually worldwide, driving over $300 million in music sales during that period. The Shazam Tag Charts are published weekly in Music Week, Record of the Day, and other leading trade outlets are used by a large number of music and media companies, A&R’s, brands, and others, to accurately gauge the success and potential of records.

However, it’s much deeper into the tail of Shazam’s data, where there is an incredible amount of valuable information pertaining to surfacing the next Gotye or Avicii ahead of anyone else. For instance, if you were to use the Shazam tagging data to track a hit record, you would observe that from the earliest stages in its life cycle, such as the first club and tastemaker plays, the most popular tracks are outliers when compared to average tracks getting exposure.

Two examples of this are from the hip-hop world, where both Rick Ross’s “Stay Schemin’” and Juicy J’s “Bandz A Make Her Dance” received more than 200,000 tags just from being on mixtapes before being released digitally by label. When tracks are out in the wild, it becomes even easier to match this to sales. Alex Clare’s “Feels So Close” has received nearly 5 million tags on Shazam; and when you plot the number of tags on Shazam against its iTunes chart position, there is an extremely tight correlation.

So when plotting your assault on marketing data, be sure to look at the success stories in the marketplace and see how they are accessing both the information generated by their own user base, as well as how they leverage the power of third-party information. Otherwise, you’re just a person drilling "dry holes" into the ground, i.e. paying for the empty promises and daydreams.

Editor’s Note: At SXSW in March 2013, Will Mills will be moderating a panel on this issue, also titled “Big Data: The New Oil, Or the New Snake Oil,” which will feature leaders from the field and they’ll be discussing how best artists and labels can work with data to build fan bases, sell music, “Data-tainment” and more. If you’re in Austin next year, come along and heckle them! is founded and edited by Kyle Bylin of Live Nation Labs. If you would like to contribute a post to be featured on the site, please reach out.

Share on:


  1. there is some very valuable data out there, but a lot of people are also letting themselves be dazzled by it, or drawing false conclusions. you know the saying, “you can’t polish a turd”? that seems to be untrue for too many people if you can come up with a nifty looking graph of some sort. i also see a lot of people wanting to apply results to only semi-related things across the board.

Comments are closed.