Well sort of. He’s been ranting a bit again recently. He’s always hated MP3 and CD as media. He was a big proponent of DVD-A (96kHz, 24 bit) and now claims he was working on another format with Steve Jobs to return to the actual potential of digital music. It’s true that a lot of modern music does sound terrible, but although MP3, AAC and other lossy data compression formats do throw away a lot of the information in the track, the bigger problem with modern music is still the loudness war and the excessive dynamic compression in the mastering of the track. The youtube clip cited in the NPR piece is here (YT: The Loudness War)
(In fairness to Neil, higher bitrates would also alleviate some of the dynamic compression issues since there’s more headroom available in mastering).
So what’s wrong with the MP3? Well not a lot really. Across the web, multiple audio forums have done some form of wav vs mp3 blind tests or ABX test to see if listeners can tell the difference. While it’s certainly possible to tell the difference between a wav and a 128Kbps MP3 (The most popular format for many years) the consensus you’ll most likely find on any forum is that above 192kbps most people can’t tell. Throw in some improvements in encoding or higher quality formats such as Apple’s AAC, and in reality, unless you’re listening on some very high end audio equipment (bear in mind that about 80% of music nowadays is listened to on laptop speakers, or cheap earbuds) the iTunes store default setting (256kbps AAC) is good enough for the vast majority of the population.
What’s that? You could totally tell, right? You can try right now to see if you notice the difference between 320kbps and 128kbps files. (Looking around the web, this was the best test out there that didn’t involve much prep work on the part of the user). Unfortunately the linked test only has three songs in its test catalogue and is comparing two different MP3 formats so it’s not exactly what Neil Young is getting at since at 320kbps most of the information has already been thrown out. I’ll be posting my own version in the next few days, which you can try out and I will post the results.
Why is it that most people can’t tell the difference, but the file sizes are so much smaller. MP3 is definitely a lossy compression algorithm and there is a lot of information not being encoded. Here’s a good demonstration of just how different the information contained in wavs and MP3’s (YT: Mp3 vs. WAV – Music Quality and Mp3 Artifacts)
. As the commenters point out however, the difference signal isn’t a good representation of how different the two files sound to the human ear and this is down to the way MP3’s work. You can also see the difference in the spectographs below (this is from the first 15 seconds of The Beach Boys – Wouldn’t It Be N. Click the images to enlarge).
MP3 and related compression algorithms are efficient because they make use of perceptual coding. In determining the information to encode, the MP3 algorithm makes use of the phenomenon of auditory masking within the critical bands of human hearing. Essentially if you have two sounds of almost the same frequency, the louder sound will mask out the quieter one so that it does not get perceived. MP3’s take this psychoacoustic knowledge and determine if a given frequency can be perceived by a human, if not, it doesn’t get encoded. You can see in the spectograms above which frequencies are ignored, with the 40kbps file being more dramatic. Even in the 128Kbps file, you can still see a hard cutoff at about 16kHz. Since most adults have difficulty hearing above these frequencies, this information tends to be dropped first, with more and more frequencies being sacrificed as the bit rate is lowered.
A CD quality track is encoded at 44,100 samples per second, with each sample being 16 bits. Every single sample is the same size so a single second of audio will be about 705 Kbits. This is not true of MP3 which utilizes a bitrate instead of a fixed bit depth. A 128Kbps bitrate therefore allows a roughly 7:1 compression ration. The algorithm is constantly making decisions on how to most efficiently encode the audio, frame by frame, to allocate its ‘budget of bits’. In a frame where you might expect a lot of masking (certain frequencies are louder or dominant) the algorithm can assign fewer bits to encode that frame. If less masking is likely (lots of very different frequencies), more bits are allocated to that section.
Looking ahead, now that storage is cheap, and bandwidth is still increasing, we can expect ‘some rich guy’, as Neil Young put it, to come up with a scheme for delivering higher quality audio to the end-user, but for now Neil will just have to live with MP3 and AAC which for the most part is good enough for the rest of us.