IAWOAP – Project Setup and Processing Audio

Screenshot of iOS Simulator showing app

IASAR’s Simple UI

I’m up and running with the Alvin Lucier inspired project I mentioned in the previous post. It’s in a pretty basic state at this point, but the good news is I can take in an audio file and process (convolve) it with an impulse response to get a first pass of the room acoustics.

For this project I’m going to follow a quasi-SCRUM methodology – essentially adding features as we go, with each post documenting a somewhat finished state and deciding at the end what the next feature to implement will be. If you’re familiar with SCRUM, you can think of each post as a SPRINT review with a backlog grooming and SPRINT planning session tacked on at the end. At the end of each SPRINT (usually a period of time like 1 or 2 weeks) you’re supposed to deliver a ‘potentially shippable product’ and a SPRINT review is held to show your work and let the product owner accept (or reject) the feature. Backlog grooming is the process of deciding priority of features and SPRINT planning is to allow you to commit to the next features to be implemented in the SPRINT ahead.

All of that is just to say that this project will be documented as a continuous work-in-progress. As a one-person team doing this in my spare time I won’t actually follow these rules, but it’ll be a guiding principle.

So down to the actual code and stuff –

One of the first steps was to set up a good old version control system. I use git / github for this, using their free account tier so you can view all the code I write along the way. I spent a bit of time configuring everything to track just the files I’m modifying and ignoring all the auto-generated stuff aswell as the framework files which are a little larger. This is done via the .gitignore file which allows you to tell git which types of files to ignore. I used the Swift.gitignore template for this and added the AudioKit.framework files to the ignore list, since you can download those from AudioKit directly instead.

So as mentioned I’m using AudioKit as the framework for processing the audio in this app.

AudioKit is an audio synthesis, processing, and analysis platform for iOS, macOS, and tvOS.

It’s an open-source project – so you can contribute if you feel like it. It both wraps and simplifies some elements of CoreAudio, and extends it allowing you to create complex processing graphs by chaining Nodes together. The current version is based on Swift 3 which requires Xcode 8 to build.

For this project, the processing needs are pretty simple. I just need to be able to get audio input, process it via convolution and output the result with multiple iterations. I decided as a first step that I could hardcode the audio input files to process a single iteration and play the resulting output. I dragged and dropped two files – sitting.wav (an excerpt from the original recording for now, which I’ll replace soon) and an impulse response file from a medium church hall – so pretty reverby to make the effect more noticeable.

Screenshot of Xcode sidebar

Xcode Project Bundle

First we load the files into the project as an AKAudioFile and a file url respectively as these are the paramaters AKConvolve() looks for.

let sourceFile = try? AKAudioFile(readFileName: "Sitting.wav", baseDir: .resources)
let urlOfIR = Bundle.main.url(forResource: "IR", withExtension: "wav")!

Next I created an AKAudioPlayer from my source file

let player = sourceFile?.player

I then hooked up the player node to the convolution process, attached the output of the convolution to the engine output (which will eventually play the audio) and started the engine.

AudioKit.output = convolvedOutput!

Then we simply need to start both the convolution and player to hear the audio output.


The UI for now is as basic as it gets. There is a single button to toggle the convolution ON/OFF. In order to achieve this the turnOffConvolution() function is attached to the UIButton with simple if/else statements to toggle start and stop. I had to declare convolvedOutput as a global before initializing later for this function to access the start/stop functions. I’m sure I’ll change that when I update the app to use the MVC (Model-View-Controller) design, but for now it works well enough.

So there it is, a very basic app to hook up our input audio to the convolution node and output the result with a button to toggle processing on/off. Up next we’ll add the processing iterations so we can completely destroy the actual speech by the room resonances to mimic the original recording.

How VR Audio works

Here’s a great post from Enda Bates of the Trinity 360 project, talking about 360 degree audio, which is the oft-forgotten half of the VR experience. Now that VR is going mainstream with Oculus Rift finally shipping and Samsung, HTC and Sony all releasing their own headsets to go along with cheaper alternatives like Google Cardboard, we are starting to see a shift towards better audio for VR. Companies like Google are focusing on spatial audio as one of the key components of the VR experience.

I’m lucky enough to get to work often with VR audio as part of my job, so it’s exciting to see it getting more of the attention it deserves as the VR market explodes. Enda’s post is a great rundown of how audio can be captured and rendered for VR and well worth checking out. I’m looking forward to catching the upcoming performance in April to see the result of the work that Enda and crew have been working towards.

Boring Blackalicious

I recently took the MIR Workshop at CCRMA Stanford (which I highly recommend) and got a chance to play around with python signal processing libraries including librosa. During the week one of the guest presenters used ‘Alphabet Aerobics’ by Blackalicious to demonstrate his source separation algorithm. This was a challenging piece of material because this track famously does not have a constant tempo and speeds up considerably throughout the song.

The thought struck me that it’d be way less interesting if the tempo was constant throughout, so this weekend I put my newly developed python processing skills to work and created ‘Boring Blackalicious’ – the constant tempo version of Alphabet Aerobics.

It uses librosa’s onset detection, beat tracking and tempo estimation functions to create a tempo map (with the help of some manual tweaking to correct the estimated tempos of the later segments, where the librosa functions had more difficulty keeping up).

The tempo map was used to calculate the correct rate to slow down each segment using librosa’s phase vocoder function. I used the phase vocoder over librosa.effect.time_stretch so I could tweak the fft length directly to get a better sounding result.

The python source is available here and you can listen to the original for comparison below!