Another issue lies with the very nature of music as a medium. In many cases, there are one-to-many relationships between musical works (the written composition) and sound recordings (the recorded performance). This is practically the norm in the classical genre, but is also prevalent with jazz standards (e.g. George Gershwin’s “Summertime”) and older pop songs, including those written for films and musicals (e.g. Harold Arlen’s “Over the Rainbow”). Additionally, artists may often go by aliases or be part of larger groups or ensembles: How do we know with certainty that Jay-Z, Jaÿ-Z, Jay Z and JAY:Z are all in fact Shawn Carter? And for an artist who is not as famous as he is, will there be an effort to reconcile those differing names?
To fix these problems on the scale required for the MLC database demands advanced technology, specifically artificial intelligence (AI), that can make accurate connections between compositions and recordings quickly. At Exactuals, we’ve created a machine learning tool called RAI that accomplishes this by mapping out all the connections that exist between data points on musical releases and using those relationships to establish links between two or more seemingly disparate points of data. Though this process typically works well, there are still some cases where potential matches don’t quite reach the similarity threshold necessary to create a link. In those situations, human musicologists and other subject matter experts supervise the process as needed. Though our aim is to minimize the number of hours humans spend manually curating this data, there are still a number of areas where AI falls short when left entirely to its own devices. Here are some reasons why human experts are necessary to the successful implementation of any AI solution adopted by the MLC.
AI needs help recognizing new content.
New songs are always being written, recorded and distributed, so any database built to keep comprehensive records on the world’s musical works needs to be built to scale at the pace that new compositions are created. One challenge for AI will be determining whether a new song should be linked to an already existing work or if it instead represents a completely new musical work. There are a number of data points, such as the date of publication or even audio fingerprints, that can be used to help in this effort, but many edge cases will need to be flagged for human attention in order to help minimize the number of false positives that inevitably slip through the cracks.
AI needs help updating established records with new information.
As new work records flow into the MLC database, it’s imperative that older records be well maintained and updated with new or corrected information as needed. While AI can be very good at recognizing patterns and making them more transparent, it’s not as skilled at discerning the most accurate data points. Since content can be distributed to a number of platforms and services with just a few clicks, it’s very easy for bad music metadata to spread. This metadata then feeds into the body of material that algorithms source and utilize, which often leads to outdated or otherwise incorrect data being weighted more heavily, simply because it is more common. Thus, there may be many cases where AI treats the most accurate data entries as noise to be ignored. But with continual human feedback, systems can learn when to treat the outliers as authoritative.
AI needs contributions from many disciplines.
Perfecting the MLC database demands expertise from a number of different disciplines, from music and software development to copyright law and statistics. Individual contributors should also ideally possess some cross-functionality: Programmers must understand the idiosyncrasies of music metadata so they build the right tools; musicologists should be familiar enough with machine learning technologies and techniques to help developers improve automation; and legal experts need a keen ear to assist in settling copyright disputes. Additionally, the problems that the music industry faces may not be exclusive to it. There might also be relevant insights from the realms of healthcare, finance and even insurance. The artists and rights holders themselves should also start collecting all the relevant data well before their first recording sessions. Ensuring that humans are better able to “get it right the first time” will make things a lot easier for everyone downstream in this process.
While the music industry at large still has some catching up to do when it comes to adoption of more advanced technologies, many parties within the industry are observing that machine learning and AI will play a critical part in building a strong, scalable solution to the royalty problem. Even so, it is important to acknowledge the not-too-unlikely possibility that these advancements may only get us 80% of the way. As machine learning still has much to learn, continual direction and feedback from musicologists and other experts will be integral in the pursuit of a lasting solution.
E’Narda McCalister is a musicologist and Product Quality Analyst for Exactuals’ RAI product, which uses machine learning to clean and enhance music metadata.