Following on from the previous post I wanted to take a look at how you can isolate the various tracks from stereo audio. Without having the official a cappella studio tracks in the first place, it is possible to recover these from the stereo audio. For the cleanest results unfortunately you’ll need the official instrumental version of the track in question – that is everything except the vocals. Studios are more likely to release the instrumentals for example for karaoke versions etc.
Extracting the vocals from a stereo audio track is done by making use of the principles of superposition and wave interference. If you superimpose 2 sine waves of equal frequency on each other you will double the amplitude since the resulting output is a simple algebraic sum of the amplitudes. If you invert one of the waves so that they are 180° out of phase (i.e. when wave A has an amplitude of 1, wave B has an amplitude of -1 etc) you are adding the direct negative and the waves will cancel each other out.
This is also the basic principle underlying Fourier Analysis which states that a complex waveform can be broken down into the sum of simple sinusoids. By subtracting the instrumental version of a track from the full audio (really inverting one wave and adding) you will be left with just the vocals. The video below demonstrates this process in audacity. The important thing is to get the alignment as close as possible to get proper cancellation without excessive phase artifacts.
Of course getting your hands on the instrumental version isn’t always possible, but there are options for working with just the original stereo file. One of the standard effects available in Audacity (and other wave editors/DAWs) is the vocal remover. This effect assumes the vocals are panned direct centre of the stereo image (In most modern recordings you usually find vocals, bass and kick/snare drums panned centre. The process can also leave some vocal artifacts for example if the producer used a stereo reverb). The left channel is subtracted (again inverted and added) from the right channel leaving a single mono track with the common elements (i.e. equal amplitude in left and right) removed.
You might suspect that the reverse process (leaving only the centre panned elements) should be simple, but this is unfortunately not the case. The problem is that generating the vocal-extracted version of the track leaves you with a single mono track which is the combined left-only and right-only channels, so a simple phase inversion/addition process will not result in cancellation.
The solution therefore is to work in the frequency domain. A nice VST plugin called knockout does just this. From the FAQ –
What does ‘extract centre’ do?
– If this is ‘on’, kn0ck0ut first subtracts the R input amplitudes from the L (SOP to remove common ie centre-panned material) then spectrally subtracts this result from the initial L input, leaving the centre of the
stereo image on the L output.
Working in the frequency domain provides us more flexibility without needing to line samples up perfectly. The results of either process are of course not perfect (as the author St3pan0va is happy to point out), and should not be used for high quality audio projects, but they do reveal the details that are not readily apparent in the main mix such as breaths, harmonies, guitar parts etc. You can hear the results below. (First segment is the original “Molly’s Chamber’s – Kings of Leon”, second segment is the L/R phase cancellation, third segment is the knockout centre extraction).
By applying a bit more effort it is possible to get more usable results. Using the instrumental only sections of a track to do some pre-processing it is possible to also remove the bass and drums which are usually panned centre aswell. Other effects such as the noise cancellation in audacity or adobe audition can help to remove some additional noise or artifacts.