IAWOAP – Project Setup and Processing Audio

Screenshot of iOS Simulator showing app

IASAR’s Simple UI

I’m up and running with the Alvin Lucier inspired project I mentioned in the previous post. It’s in a pretty basic state at this point, but the good news is I can take in an audio file and process (convolve) it with an impulse response to get a first pass of the room acoustics.

For this project I’m going to follow a quasi-SCRUM methodology – essentially adding features as we go, with each post documenting a somewhat finished state and deciding at the end what the next feature to implement will be. If you’re familiar with SCRUM, you can think of each post as a SPRINT review with a backlog grooming and SPRINT planning session tacked on at the end. At the end of each SPRINT (usually a period of time like 1 or 2 weeks) you’re supposed to deliver a ‘potentially shippable product’ and a SPRINT review is held to show your work and let the product owner accept (or reject) the feature. Backlog grooming is the process of deciding priority of features and SPRINT planning is to allow you to commit to the next features to be implemented in the SPRINT ahead.

All of that is just to say that this project will be documented as a continuous work-in-progress. As a one-person team doing this in my spare time I won’t actually follow these rules, but it’ll be a guiding principle.

So down to the actual code and stuff –

One of the first steps was to set up a good old version control system. I use git / github for this, using their free account tier so you can view all the code I write along the way. I spent a bit of time configuring everything to track just the files I’m modifying and ignoring all the auto-generated stuff aswell as the framework files which are a little larger. This is done via the .gitignore file which allows you to tell git which types of files to ignore. I used the Swift.gitignore template for this and added the AudioKit.framework files to the ignore list, since you can download those from AudioKit directly instead.

So as mentioned I’m using AudioKit as the framework for processing the audio in this app.

AudioKit is an audio synthesis, processing, and analysis platform for iOS, macOS, and tvOS.

It’s an open-source project – so you can contribute if you feel like it. It both wraps and simplifies some elements of CoreAudio, and extends it allowing you to create complex processing graphs by chaining Nodes together. The current version is based on Swift 3 which requires Xcode 8 to build.

For this project, the processing needs are pretty simple. I just need to be able to get audio input, process it via convolution and output the result with multiple iterations. I decided as a first step that I could hardcode the audio input files to process a single iteration and play the resulting output. I dragged and dropped two files – sitting.wav (an excerpt from the original recording for now, which I’ll replace soon) and an impulse response file from a medium church hall – so pretty reverby to make the effect more noticeable.

Screenshot of Xcode sidebar

Xcode Project Bundle

First we load the files into the project as an AKAudioFile and a file url respectively as these are the paramaters AKConvolve() looks for.

let sourceFile = try? AKAudioFile(readFileName: "Sitting.wav", baseDir: .resources)
let urlOfIR = Bundle.main.url(forResource: "IR", withExtension: "wav")!

Next I created an AKAudioPlayer from my source file

let player = sourceFile?.player

I then hooked up the player node to the convolution process, attached the output of the convolution to the engine output (which will eventually play the audio) and started the engine.

AudioKit.output = convolvedOutput!

Then we simply need to start both the convolution and player to hear the audio output.


The UI for now is as basic as it gets. There is a single button to toggle the convolution ON/OFF. In order to achieve this the turnOffConvolution() function is attached to the UIButton with simple if/else statements to toggle start and stop. I had to declare convolvedOutput as a global before initializing later for this function to access the start/stop functions. I’m sure I’ll change that when I update the app to use the MVC (Model-View-Controller) design, but for now it works well enough.

So there it is, a very basic app to hook up our input audio to the convolution node and output the result with a button to toggle processing on/off. Up next we’ll add the processing iterations so we can completely destroy the actual speech by the room resonances to mimic the original recording.

I am working on a project, one different than the one you are working on now…

This is one of those times when I actually try and follow through on random project ideas I post to Twitter. (I have a mixed record in that regard). I’ve been taking a course in Swift for iOS programming, and playing around with AudioKit to build some simple audio apps. I’d like to flex my programming chops a little more so I’ve decided to work on these projects. I’ll probably do the Lucier-inspired app first and follow it up with the Steve Reich ones since they’ll probably be a little more involved (though it’s quite likely I’m underestimating the workload here).

In order to keep myself accountable, and actually build something I’m not ashamed to show people, I’ll be writing up kind of ‘status reports’ on here. Sharing progress when I work on it, explaining how I handled some things and generally explaining how it works as I go. I’ll probably only start seriously working on it in the new year.

So here we go with a basic functional outline for the ‘I Am Sitting in a Room’ inspired project, aka ‘Project: iOS am convolving in a Room’

We are going to try and match the basic process used by Alvin Lucier to record the original piece (replacing the re-recording via analogue tape with convolution).

  • The user starts by either recording a short sample of audio or selecting one from their library. It’d be nice to use the original Lucier audio as a default also but I’d need to check if it’s public domain or otherwise get permission to use
  • The user records a simple impulse response either by clapping or turning the volume up and playing the impulse from the app through the speakers
  • The user selects either a total length or number of loops – Default will match the Lucier original
  • The user can play back the processed audio, trim it, and export the saved track
  • I’ll probably make use of Audiokit to capture the user inputs and use their AKConvolve node to do the processing, so most of the heavy lifting for me will be getting audio in and out of the app itself, although depending on the length of the input sample I may have to do some memory optimization or more likely limit the max input sample length.

    If you’re interested in following along, you’ll want an iOS device of some sort, and a Mac running Xcode 8. You’ll also want to download and install Audiokit into your Xcode project to access their functions.

    This should be fun and sure to keep the @sittinginaroom bot happy

Boring Blackalicious

I recently took the MIR Workshop at CCRMA Stanford (which I highly recommend) and got a chance to play around with python signal processing libraries including librosa. During the week one of the guest presenters used ‘Alphabet Aerobics’ by Blackalicious to demonstrate his source separation algorithm. This was a challenging piece of material because this track famously does not have a constant tempo and speeds up considerably throughout the song.

The thought struck me that it’d be way less interesting if the tempo was constant throughout, so this weekend I put my newly developed python processing skills to work and created ‘Boring Blackalicious’ – the constant tempo version of Alphabet Aerobics.

It uses librosa’s onset detection, beat tracking and tempo estimation functions to create a tempo map (with the help of some manual tweaking to correct the estimated tempos of the later segments, where the librosa functions had more difficulty keeping up).

The tempo map was used to calculate the correct rate to slow down each segment using librosa’s phase vocoder function. I used the phase vocoder over librosa.effect.time_stretch so I could tweak the fft length directly to get a better sounding result.

The python source is available here and you can listen to the original for comparison below!