An astonishing piece of research from the folks at MIT CSAIL was announced for SIGGRAPH 2014. The paper from Davis et al (Abe Davis and Michael Rubinstein and Neal Wadhwa and Gautham Mysore and Fredo Durand and William T. Freeman), entitled The Visual Microphone – Passive Recovery of Sound From Video, shows how they were able to recover comprehensible speech from a recording of the leaves of a plant or a bag of chips or other objects which vibrate in response to sound pressure.
The technique is similar to laser-based microphones (active recovery) but doesn’t rely on any external illumination or lasers, and can instead recover the audio from standard lighting from recorded images. For the best results, they used a high speed camera since the sampling frequency of the recovered audio is directly related to the frame rate. By doing some complicated analysis from previous motion graphics research, they were able to average the minute motions of the object (almost imperceptible – up to 1/1000th of a pixel) and average the recorded vibrations into a 1D signal – an audio signal. Further signal analysis, filtering and noise reduction is carried out to clean up the captured signal.
The results are featured on their project page, with code to be added soon, and they are nothing short of unbelievable. However, they even went one step further and devised a technique to take advantage of the rolling shutter effect of standard cellphone cameras to calculate the vibration per row of each frame, instead of per frame, to effectively increase the sampling rate and recover intelligible audio from a standard consumer grade camera.
Check out the introduction video below!