IAWOAP Part 3 – Iterations and Issues

Screenshot of IASIAR App

I’ve been slowly adding to the IASIAR app over the last month to add some important features and I’m happy to say that the app now has the basic functionality needed to iteratively process a recording, convolving it with an impulse response over and over again. It’s come a long way with lots of little iterations and improvements along the way – 43 commits and counting – so iteration is the theme here. The UI screenshot posted above will give you some idea of the functionality at the moment and I’ll go into detail below.

In my previous post, I had just gotten the file input, convolution node and audio output working. I was able to add the following features and fixes since then (in order of implementation)

  • Added Slider selection to UI
  • Added recording and export of output to file
  • Added process function for iterated convolutions
  • Added a load iteration function to easily switch between iterations
  • Made process function asynchronous and updated the UI
  • Normalized the IR before processing
  • UI Updates including a processing indicator

Not a bad list of features and doesn’t include a number of issues I had to struggle through which I’ll also detail later.

Here’s some sample output, sounds pretty good to me 🙂

So let’s jump into some code

Added Slider Selection

This was a pretty easy update. Adding a slider button from the main.storyboard. The problem I usually have is remembering to relink the @IBOutlet / @IBAction tied to the UI elements when I update them. That is if you make your @IBAction function tied to the UI element take a parameter (in this case a ‘sender’ to get the value of the slider), you have to relink it as that’s essentially a different function then a function with the same name that takes no parameters. The slider function takes the slider value and sets the number of iterations to that value. It also updates a text label to display the value.

@IBAction func updateNumIterations(_ sliderValue: UISlider){
        numberOfIterations = Int(sliderValue.value)
        displayIterations?.text = ("Number of Iterations: \(numberOfIterations)")

I also had to remember the correct way to cast an Int to a String.

Added recording and export of output to file

This feature also included changing the automatic playback to only playback when the new PLAY button is pressed, and eventually adding a RECORD OUTPUT button. The PLAY button function is pretty simple, just toggling between PLAY/PAUSED states. The player! variable is an AKAudioFile?.player for the source file (in my case this is still the original “I Am Sitting In A Room” audio excerpt) which gets passed to the convolution node.

@IBAction func playButtonPressed(_ sender: UIButton){
        if player!.isPlaying{
            sender.setTitle("PLAY", for: .normal)
            sender.setTitle("PAUSE", for: .normal)


One thing I learned to do was to make all node variables instances rather than local, since you don’t want nodes to disappear when a function call is completed, causing the AudioKit engine to crash. I learned this through a lot of trial and error on where and when to initialize the nodes and the engine.

Recording was a little trickier and also where I ran into a lot of crashes. It turned out to be a bit frustrating as I hit issues with Swift API changes. The AudioKit team is great at keeping up to date with the latest versions, but this can have the side-affect of causing backward compatibility issues if you are not running the latest version. In this case my project was created with Swift 3.0.2 and AudioKit 3.4. I couldn’t update to Swift 3.1 as that requires Xcode 8.3 which in turn requires MacOS Sierra. Eventually the solution I hit on was to replace the prebuilt AudioKit.framework with the source code and build it from source myself. This also allowed me to revert the seemingly small change that was causing the app to crash every time I tried to export the audio due to not creating a valid filename due to missing the file:// prefix because of the way Swift 3.1 handles the ‘url’ parameter.

Another problem I had was that I couldn’t export as .wav and eventually had to settle on always export as .caf (CoreAudio File). Definitely an issue I’d like to address in the future as I’d like to be able to export compressed files.

Another common AudioKit issue with the AKNodeRecorder is that in order for it to work correctly, it should be attached to a mixer node and not directly to the convolution node (or more commonly in other projects a mic input node). It took quite a bit of experimentation to get the graph setup correctly such that the nodes were connected properly. This was where I ran into the most frustrating issue – it only works on headphones. For some reason which I haven’t figured out yet, when the output is set to speaker I get no audio output, thus the recording is empty and I would get errors exporting an empty file!

The record function itself is pretty simple. Again it’s state based. If it’s not recording the output, it will start the recording. If it is currently recording, it will stop the recording and export the output.

@IBAction func recordButtonPressed(){
        if recorder!.isRecording {
            print("Ready to Export")
            tape!.player?.audioFile.exportAsynchronously(name: "IASIAR_output.caf", baseDir: .documents, exportFormat: .caf) {_, error in
                print("Writing the output file")
                if error != nil {
                    print("Export Failed \(error)")
                } else {
                    print("Export succeeded")
        else {

            try recorder?.reset()
        } catch { print("Couldn't reset recording buffer")}
        do {

            try recorder?.record()
                    } catch { print("Error Recording") }
        print("Recording Started")

Most of the AudioKit engine setup is done in the new processing function which now runs on an asynchronous thread and is called when the app first loads. For playback and recording we have

Source File -> Convolution Node (with selected IR iteration) -> ConvolveMixer -> RecordMixer -> AudioKit.output

The AKNodeRecorder is attached to the convolveMixer

The beginning of the process function stops the engine running and loads the source file

            self.player = self.sourceFile?.player
            self.recordMixer = AKMixer(self.player!)
            self.IR = try? AKAudioFile(readFileName: "grange.wav", baseDir: .resources)

and later in that function we set up the graph and restart the engine. The last line adds the AKNodeRecorder to the output to enable the record output function to do its thing.

self.convolvedOutput = AKConvolution(self.player!, impulseResponseFileURL: self.IRPlayer[(self.numberOfIterations-1)]!.audioFile.url)
            self.convolveMixer = AKMixer(self.convolvedOutput!)
            self.recordMixer = AKMixer(self.convolveMixer)
            self.tape = try? AKAudioFile(name:"output")
            AudioKit.output = self.recordMixer
            self.recorder = try? AKNodeRecorder(node: self.convolveMixer, file: self.tape!)

Process function, loading iterations and processing asynchronously

Now for the essence of the “I Am Sitting in a Room” functionality. Processing iterated convolutions to simulate playing back the tape into the room again and again. Convolution is a pretty expensive computational process. Luckily for us it has some properties that make processing iterations less intensive. I’m talking about commutative and associative properties of convolution.

Essentially instead of calculating


a much more efficient way of doing things is to convolve the IR with itself over and over again, since the IR has much fewer samples.

IR * IR = IR-1, IR-1 * IR = IR-2, IR-2 * IR = IR-3... SOURCE * IR-N = OUTPUT-N

This still ended up being pretty heavy computationally as the output of a convolution is bigger than the source. A convolution output is N + M -1 samples long. I had figured since I was convolving the output with itself the length of the output would be 2N-1. This is fine for a single iteration, but not valid for subsequent iterations because we’re still convolving with the original IR, not the newly generated IR. Fixing this resulted in computation time dropping from O(N^2) to O(N) complexity. Phew!

SO the basic process is to setup the convolution node with the input IR with itself and output to an AKNodeRecorder and generate a new AKAudioFile. The next iteration of a simple for loop would set the convolution to the input IR with the previously generated output. The for loop runs for the number of iterations selected by the user via the slider.

DispatchQueue.main.async {
                print("This is run on the main queue, after the previous code in outer block")
                self.processButton?.setTitle("Processing", for: .normal)

            do {
                try self.normalizedIR = self.IR?.normalized()
                print (self.normalizedIR!.maxLevel)
            } catch { print("Error Normalizing")}
            self.iterateFileIR = try? AKAudioFile(name:"temp_recording")
            for index in 0..<self.numberOfIterations{
                if (index==0){
                    self.IRPlayer[index] = self.normalizedIR?.player
                    self.IRPlayer[index] = self.iterateRecorder?.audioFile?.player
                self.iteratedIR = AKConvolution(self.IRPlayer[index]!, impulseResponseFileURL: self.urlOfIR! )
                self.iterateMixer = AKMixer(self.iteratedIR!)
                AudioKit.output = self.iterateMixer
                self.iterateRecorder = try? AKNodeRecorder(node: self.iterateMixer)
                    try self.iterateRecorder?.reset()
                } catch { print("Couldn't reset recording buffer")}
                do {
                    try self.iterateRecorder?.record()
                } catch { print("Error Recording") }
                print("Recording Started")
                //AudioKit.output = iteratedIR
                //booster = AKBooster(IRPlayer[index]!,gain: 0)
                }while self.iterateRecorder!.recordedDuration <= (((self.IRPlayer[index]?.audioFile.player!.duration)!)+(self.IRPlayer[0]?.audioFile.player!.duration)!)-1

Doing it this way also allowed me to add the load iteration function, simply by having the user select an iteration number and setting the main convolution IR url to that of the iteratedIR[selectedIteration-1] (-1 since the first array index is 0).

Finally since the processing still takes some time (the outputs are generated in realtime), it was blocking the UI from updating and gave no indication to the user that it was still processing. To fix this I made that function asynchronous. This is accomplished using dispatchQueue.

In Swift 3 this looks like

DispatchQueue.global(qos: .background).async {
            print("This is run on the background queue")

            // Insert code to run in background

            DispatchQueue.main.async {
                print("This is run on the main queue, after the previous code in outer block")
            // Insert UI code to run in the main thread
                self.processButton?.setTitle("Processing", for: .normal)

Note the call to the main thread with UI stuff is nested within the background thread code

Finally I made the incoming impulse response normalized. This should have been straightforward since AudioKit AKAudioFiles have a ready made function to do this, but what I hadn't realized was that it was having issues when it received a mono file. I fixed this by editing the input IR to make it a stereo track by copying the mono track to the L and R, but hopefully there's a fix that allows both mono and stereo impulse responses to be processed.

So there are now a couple of major issues to be resolved before moving on to the next set of features

  1. Getting output working when there are no headphones connected
  2. Normalizing a mono IR file

Next Steps

Other than the issues I listed above, I'd love to get the following features working next.

  • Clean up code to conform to MODEL-VIEW-CONTROLLER paradigm
  • Record IR from the microphone
  • Select IR from the user's library
  • Record new source file
  • Select source file form user's library
  • UI improvements - Make it look pretty
  • Export options - Cloud sharing type stuff instead of relying on iTunes

Whew! Okay that was a lot (I may break this post up into individual posts to make it all clearer). Would love to hear any questions or comments!

Boring Blackalicious

I recently took the MIR Workshop at CCRMA Stanford (which I highly recommend) and got a chance to play around with python signal processing libraries including librosa. During the week one of the guest presenters used ‘Alphabet Aerobics’ by Blackalicious to demonstrate his source separation algorithm. This was a challenging piece of material because this track famously does not have a constant tempo and speeds up considerably throughout the song.

The thought struck me that it’d be way less interesting if the tempo was constant throughout, so this weekend I put my newly developed python processing skills to work and created ‘Boring Blackalicious’ – the constant tempo version of Alphabet Aerobics.

It uses librosa’s onset detection, beat tracking and tempo estimation functions to create a tempo map (with the help of some manual tweaking to correct the estimated tempos of the later segments, where the librosa functions had more difficulty keeping up).

The tempo map was used to calculate the correct rate to slow down each segment using librosa’s phase vocoder function. I used the phase vocoder over librosa.effect.time_stretch so I could tweak the fft length directly to get a better sounding result.

The python source is available here and you can listen to the original for comparison below!

Compressed Sensing – An Introduction

This week Emmanuel Candès, professor of Mathematics and Statistics at Stanford University has been elected to the American Academy of Arts and Sciences. He is one of the authors (the other being Terence Tao) of the paper which brought about the field of compressed sensing. If you’ve ever done any signal processing, then you’ll know

Continue reading »

Extracting Audio From a Mix by Singing It – Source Separation via Input Matching

Paris Smaragdis is an assistant professor at the University of Illinois who specializes in research that involves machine listening. This includes source localization (where the sound is coming from), sound recognition (such as a traffic accident at an intersection) and source separation (taking individual voices or instruments out of a mix). Source separation is a

Continue reading »