Spotify is a company that everybody knows and loves. It brings you music all day long and it changed the way we enjoy music. If you are a developer, chances are, you’re listening to it right now!
But Spotify is also a powerhouse of Machine Learning and Big Data. All the music is data and metadata that can be used to personalize your experience. What is obvious now was quite a revelation in 1998, when I, as a grad student from UPenn, was an intern at the NEC Research Institute in Princeton. One of my fellow interns was Brian Whitman, a grad student from Columbia. He was working on MinnowMatch — a project to search music by its deep features, such as spectral characteristics. Since there was no freely or easily available digital music then, and the Internet tubes were puny and slow, Brian and his friends used their research budget to go on a CD shopping spree, buying enormous amounts of discs covering all the popular American music — which was a great way for me to learn the names of the bands and genres. The MinnowMatch team built a custom Linux box with numerous CD drives to rip them as fast as they could.
Fast forward sixteen years, and Brian’s company, EchoNest, is acquired by Spotify, with Brian becoming its Chief Music Scientist. If you follow Brian’s blog, you can see that he is a hands-on scientist: he tested his algorithms by coding them up and spinning up AWS clusters to run them, and experimented with all kinds of data, e.g. using NLP to enrich the metadata. EchoNest was the richest repository of music metadata. Brian’s engineering focus, and end-to-end data pipelines to music data, is aligned with both the Spotify approach — probably contributing to the acquisition — and Scale By the Bay core themes.
Spotify uses Scala and Google Cloud to scale its Machine Learning. It cares about infrastructure design as much as it cares about the AI. We’ve met great engineers and data scientists from Spotify at SF Scala, NY Scala, and Scale By the Bay over the years. Spotify hosted our cognifest.org in their offices in New York, and our meetups in its offices in San Francisco. All the offices have stages with drums, ready to rock!
Spotify comes from Sweden, and the Swedish touch is felt in its thoughtful design, approach to life, hospitality, and the calm and peaceful feeling that is hard to describe but makes both listening to music and appreciating its platform quite personal.
Here’s a quick overview of the talks we hosted with Spotify over the years.
Neville Li is the principal engineer at Spotify who pioneered Scala as the language of big data manipulation and feature engineering for Machine Learning. Neville was one of the early users of Scala Macros, which were controversial at the time, to simplify access to the Parquet files storing music data. Neville also leads the development of Scio, the Spotify API to the Google Cloud. Scio is now used across the company and serves as a gateway to strongly-typed data science built on top of it.
Neville presented Scala macro uses for data at SF Scala in 2015:
…and Scala Data Pipelines at Big Data Scala 2015:
He unveiled Scio at SF Scala at SBTB 2016, and presented Featran77 — generic feature transformer for ML pipelines — at SBTB 2017.
In 2017, we put together Cognifest NYC, an event on applied AI, and Spotify was one of the core companies that hosted us. Samantha Hansen and Fallon Chen presented Featran77 in depth — the system itself and its use in practice. This is exactly the kind of connection in engineering and data science that we love!
Nikhil Tibrewal shows how playlist recommendations are made at Spotify:
And most recently, Julien Tournay and Bram Leenders presented Scio data processing nirvana ar SBTB 2018, showing how scio enables Spotify across the field:
This year, Spotify is back at Scale By the Bay with a talk “How to Eliminate Surprises In Your Data” by Anne DeCusatis, Data Infrastructure Engineer, and Idrees Khan, Senior Data Engineer. Spotify is also our one and only Fika Sponsor (more about it in the future posts) and we cannot wait for you to experience this traditional Swedish ritual with us at Scale By the Bay.
From the cultural changes Spotify makes to give data engineers a quality mindset, to the specific tools developed by the team, Anne and Idrees will explain how they increase confidence and eliminate surprises in their data contents, and how they approach problems in the wide space of ‘data quality.’ You’ll learn about a few key moments in the pipeline lifecycle when data quality might be compromised, and the approach Spotify took to improving them.
Join Spotify at Scale By the Bay - book your ticket now.