- Over-focus on the most popular artists. There is a long tail of music which only gets preserved when a single person cares enough to share it. And such files are often poorly seeded.
- We primarily used Spotify’s “popularity” metric to prioritize tracks. View the top 10,000 most popular songs in this HTML file (13.8MB gzipped).
- For popularity>0, we got close to all tracks on the platform. The quality is the original OGG Vorbis at 160kbit/s. Metadata was added without reencoding the audio (and an archive of diff files is available to reconstruct the original files from Spotify, as well as a metadata file with original hashes and checksums).
- For popularity=0, we got files representing about half the number of listens (either original or a copy with the same ISRC). The audio is reencoded to OGG Opus at 75kbit/s — sounding the same to most people, but noticeable to an expert.
Perhaps I’m reading this wrong, but is this not a little backwards? Since unpopular music is poorly preserved, shouldn’t the focus be on getting the least popular music first?



I mean, they say earlier that music is actually well-preserved, but it’s disproportionately popular music. If the goal is then to preserve everything, I’d expect them to go for stuff that isn’t likely to be in some random audiophile’s collection or whatever then.