IAWOAP Part 3 – Iterations and Issues

Screenshot of IASIAR App

I’ve been slowly adding to the IASIAR app over the last month to add some important features and I’m happy to say that the app now has the basic functionality needed to iteratively process a recording, convolving it with an impulse response over and over again. It’s come a long way with lots of little iterations and improvements along the way – 43 commits and counting – so iteration is the theme here. The UI screenshot posted above will give you some idea of the functionality at the moment and I’ll go into detail below.

In my previous post, I had just gotten the file input, convolution node and audio output working. I was able to add the following features and fixes since then (in order of implementation)

  • Added Slider selection to UI
  • Added recording and export of output to file
  • Added process function for iterated convolutions
  • Added a load iteration function to easily switch between iterations
  • Made process function asynchronous and updated the UI
  • Normalized the IR before processing
  • UI Updates including a processing indicator

Not a bad list of features and doesn’t include a number of issues I had to struggle through which I’ll also detail later.

Here’s some sample output, sounds pretty good to me 🙂

So let’s jump into some code

Added Slider Selection

This was a pretty easy update. Adding a slider button from the main.storyboard. The problem I usually have is remembering to relink the @IBOutlet / @IBAction tied to the UI elements when I update them. That is if you make your @IBAction function tied to the UI element take a parameter (in this case a ‘sender’ to get the value of the slider), you have to relink it as that’s essentially a different function then a function with the same name that takes no parameters. The slider function takes the slider value and sets the number of iterations to that value. It also updates a text label to display the value.

@IBAction func updateNumIterations(_ sliderValue: UISlider){
        numberOfIterations = Int(sliderValue.value)
        displayIterations?.text = ("Number of Iterations: \(numberOfIterations)")
    }

I also had to remember the correct way to cast an Int to a String.

Added recording and export of output to file

This feature also included changing the automatic playback to only playback when the new PLAY button is pressed, and eventually adding a RECORD OUTPUT button. The PLAY button function is pretty simple, just toggling between PLAY/PAUSED states. The player! variable is an AKAudioFile?.player for the source file (in my case this is still the original “I Am Sitting In A Room” audio excerpt) which gets passed to the convolution node.

@IBAction func playButtonPressed(_ sender: UIButton){
        
        if player!.isPlaying{
            player!.pause()
            sender.setTitle("PLAY", for: .normal)
        }
        else
        {
            player!.start()
            sender.setTitle("PAUSE", for: .normal)

        }
    }

One thing I learned to do was to make all node variables instances rather than local, since you don’t want nodes to disappear when a function call is completed, causing the AudioKit engine to crash. I learned this through a lot of trial and error on where and when to initialize the nodes and the engine.

Recording was a little trickier and also where I ran into a lot of crashes. It turned out to be a bit frustrating as I hit issues with Swift API changes. The AudioKit team is great at keeping up to date with the latest versions, but this can have the side-affect of causing backward compatibility issues if you are not running the latest version. In this case my project was created with Swift 3.0.2 and AudioKit 3.4. I couldn’t update to Swift 3.1 as that requires Xcode 8.3 which in turn requires MacOS Sierra. Eventually the solution I hit on was to replace the prebuilt AudioKit.framework with the source code and build it from source myself. This also allowed me to revert the seemingly small change that was causing the app to crash every time I tried to export the audio due to not creating a valid filename due to missing the file:// prefix because of the way Swift 3.1 handles the ‘url’ parameter.

Another problem I had was that I couldn’t export as .wav and eventually had to settle on always export as .caf (CoreAudio File). Definitely an issue I’d like to address in the future as I’d like to be able to export compressed files.

Another common AudioKit issue with the AKNodeRecorder is that in order for it to work correctly, it should be attached to a mixer node and not directly to the convolution node (or more commonly in other projects a mic input node). It took quite a bit of experimentation to get the graph setup correctly such that the nodes were connected properly. This was where I ran into the most frustrating issue – it only works on headphones. For some reason which I haven’t figured out yet, when the output is set to speaker I get no audio output, thus the recording is empty and I would get errors exporting an empty file!

The record function itself is pretty simple. Again it’s state based. If it’s not recording the output, it will start the recording. If it is currently recording, it will stop the recording and export the output.

@IBAction func recordButtonPressed(){
        
        if recorder!.isRecording {
            recorder?.stop()
            
            print("Ready to Export")
            print((recorder?.audioFile)!.fileName)
            print(tape!.fileName)
            tape!.player?.audioFile.exportAsynchronously(name: "IASIAR_output.caf", baseDir: .documents, exportFormat: .caf) {_, error in
                print("Writing the output file")
                if error != nil {
                    print("Export Failed \(error)")
                } else {
                    print("Export succeeded")
                }
            }
        }
        else {
        
        

        do{
            try recorder?.reset()
        } catch { print("Couldn't reset recording buffer")}
        
        do {

            try recorder?.record()
            
            
                    } catch { print("Error Recording") }
        print("Recording Started")
        
        }
 
    }

Most of the AudioKit engine setup is done in the new processing function which now runs on an asynchronous thread and is called when the app first loads. For playback and recording we have

Source File -> Convolution Node (with selected IR iteration) -> ConvolveMixer -> RecordMixer -> AudioKit.output

The AKNodeRecorder is attached to the convolveMixer

The beginning of the process function stops the engine running and loads the source file

AudioKit.stop()
            self.player = self.sourceFile?.player
            self.recordMixer = AKMixer(self.player!)
            self.IR = try? AKAudioFile(readFileName: "grange.wav", baseDir: .resources)

and later in that function we set up the graph and restart the engine. The last line adds the AKNodeRecorder to the output to enable the record output function to do its thing.

self.convolvedOutput = AKConvolution(self.player!, impulseResponseFileURL: self.IRPlayer[(self.numberOfIterations-1)]!.audioFile.url)
            self.convolveMixer = AKMixer(self.convolvedOutput!)
            
            self.recordMixer = AKMixer(self.convolveMixer)
            self.tape = try? AKAudioFile(name:"output")
            AudioKit.output = self.recordMixer
            AudioKit.start()
            
            self.convolvedOutput!.start()
            self.recorder = try? AKNodeRecorder(node: self.convolveMixer, file: self.tape!)

Process function, loading iterations and processing asynchronously

Now for the essence of the “I Am Sitting in a Room” functionality. Processing iterated convolutions to simulate playing back the tape into the room again and again. Convolution is a pretty expensive computational process. Luckily for us it has some properties that make processing iterations less intensive. I’m talking about commutative and associative properties of convolution.

Essentially instead of calculating

SOURCE * IR = OUTPUT-1, OUTPUT-1 * IR = OUTPUT-2, OUTPUT-2 * IR = OUTPUT-3 etc

a much more efficient way of doing things is to convolve the IR with itself over and over again, since the IR has much fewer samples.

IR * IR = IR-1, IR-1 * IR = IR-2, IR-2 * IR = IR-3... SOURCE * IR-N = OUTPUT-N

This still ended up being pretty heavy computationally as the output of a convolution is bigger than the source. A convolution output is N + M -1 samples long. I had figured since I was convolving the output with itself the length of the output would be 2N-1. This is fine for a single iteration, but not valid for subsequent iterations because we’re still convolving with the original IR, not the newly generated IR. Fixing this resulted in computation time dropping from O(N^2) to O(N) complexity. Phew!

SO the basic process is to setup the convolution node with the input IR with itself and output to an AKNodeRecorder and generate a new AKAudioFile. The next iteration of a simple for loop would set the convolution to the input IR with the previously generated output. The for loop runs for the number of iterations selected by the user via the slider.

DispatchQueue.main.async {
                print("This is run on the main queue, after the previous code in outer block")
                
                
                self.processButton?.setTitle("Processing", for: .normal)
                self.processingIndicator?.startAnimating()
            }

            do {
                try self.normalizedIR = self.IR?.normalized()
                print (self.normalizedIR!.maxLevel)
            } catch { print("Error Normalizing")}
            
            self.iterateFileIR = try? AKAudioFile(name:"temp_recording")
            
            for index in 0..<self.numberOfIterations{
                self.IRPlayer.append(nil)
                
                if (index==0){
                    self.IRPlayer[index] = self.normalizedIR?.player
                }
                else{
                    self.IRPlayer[index] = self.iterateRecorder?.audioFile?.player
                }
                self.iteratedIR = AKConvolution(self.IRPlayer[index]!, impulseResponseFileURL: self.urlOfIR! )
                self.iterateMixer = AKMixer(self.iteratedIR!)
                AudioKit.output = self.iterateMixer
                AudioKit.start()
                self.iteratedIR!.start()
                self.IRPlayer[index]!.start()
                self.iterateRecorder = try? AKNodeRecorder(node: self.iterateMixer)
                
                do{
                    try self.iterateRecorder?.reset()
                } catch { print("Couldn't reset recording buffer")}
                print(self.IR)
                do {
                    
                    try self.iterateRecorder?.record()
                    
                    
                } catch { print("Error Recording") }
                print("Recording Started")
                
                //AudioKit.output = iteratedIR
                //booster = AKBooster(IRPlayer[index]!,gain: 0)
                
                //IRPlayer[index]!.start()
                repeat{
                    
                }while self.iterateRecorder!.recordedDuration <= (((self.IRPlayer[index]?.audioFile.player!.duration)!)+(self.IRPlayer[0]?.audioFile.player!.duration)!)-1
                
                AudioKit.stop()

Doing it this way also allowed me to add the load iteration function, simply by having the user select an iteration number and setting the main convolution IR url to that of the iteratedIR[selectedIteration-1] (-1 since the first array index is 0).

Finally since the processing still takes some time (the outputs are generated in realtime), it was blocking the UI from updating and gave no indication to the user that it was still processing. To fix this I made that function asynchronous. This is accomplished using dispatchQueue.

In Swift 3 this looks like

DispatchQueue.global(qos: .background).async {
            print("This is run on the background queue")

            // Insert code to run in background

            DispatchQueue.main.async {
                print("This is run on the main queue, after the previous code in outer block")
            // Insert UI code to run in the main thread
                self.processButton?.setTitle("Processing", for: .normal)
                self.processingIndicator?.startAnimating()
            }
       }

Note the call to the main thread with UI stuff is nested within the background thread code

Finally I made the incoming impulse response normalized. This should have been straightforward since AudioKit AKAudioFiles have a ready made function to do this, but what I hadn't realized was that it was having issues when it received a mono file. I fixed this by editing the input IR to make it a stereo track by copying the mono track to the L and R, but hopefully there's a fix that allows both mono and stereo impulse responses to be processed.

So there are now a couple of major issues to be resolved before moving on to the next set of features

  1. Getting output working when there are no headphones connected
  2. Normalizing a mono IR file

Next Steps

Other than the issues I listed above, I'd love to get the following features working next.

  • Clean up code to conform to MODEL-VIEW-CONTROLLER paradigm
  • Record IR from the microphone
  • Select IR from the user's library
  • Record new source file
  • Select source file form user's library
  • UI improvements - Make it look pretty
  • Export options - Cloud sharing type stuff instead of relying on iTunes

Whew! Okay that was a lot (I may break this post up into individual posts to make it all clearer). Would love to hear any questions or comments!

IAWOAP – Project Setup and Processing Audio

Screenshot of iOS Simulator showing app

IASAR’s Simple UI

I’m up and running with the Alvin Lucier inspired project I mentioned in the previous post. It’s in a pretty basic state at this point, but the good news is I can take in an audio file and process (convolve) it with an impulse response to get a first pass of the room acoustics.

For this project I’m going to follow a quasi-SCRUM methodology – essentially adding features as we go, with each post documenting a somewhat finished state and deciding at the end what the next feature to implement will be. If you’re familiar with SCRUM, you can think of each post as a SPRINT review with a backlog grooming and SPRINT planning session tacked on at the end. At the end of each SPRINT (usually a period of time like 1 or 2 weeks) you’re supposed to deliver a ‘potentially shippable product’ and a SPRINT review is held to show your work and let the product owner accept (or reject) the feature. Backlog grooming is the process of deciding priority of features and SPRINT planning is to allow you to commit to the next features to be implemented in the SPRINT ahead.

All of that is just to say that this project will be documented as a continuous work-in-progress. As a one-person team doing this in my spare time I won’t actually follow these rules, but it’ll be a guiding principle.

So down to the actual code and stuff –

One of the first steps was to set up a good old version control system. I use git / github for this, using their free account tier so you can view all the code I write along the way. I spent a bit of time configuring everything to track just the files I’m modifying and ignoring all the auto-generated stuff aswell as the framework files which are a little larger. This is done via the .gitignore file which allows you to tell git which types of files to ignore. I used the Swift.gitignore template for this and added the AudioKit.framework files to the ignore list, since you can download those from AudioKit directly instead.

So as mentioned I’m using AudioKit as the framework for processing the audio in this app.

AudioKit is an audio synthesis, processing, and analysis platform for iOS, macOS, and tvOS.

It’s an open-source project – so you can contribute if you feel like it. It both wraps and simplifies some elements of CoreAudio, and extends it allowing you to create complex processing graphs by chaining Nodes together. The current version is based on Swift 3 which requires Xcode 8 to build.

For this project, the processing needs are pretty simple. I just need to be able to get audio input, process it via convolution and output the result with multiple iterations. I decided as a first step that I could hardcode the audio input files to process a single iteration and play the resulting output. I dragged and dropped two files – sitting.wav (an excerpt from the original recording for now, which I’ll replace soon) and an impulse response file from a medium church hall – so pretty reverby to make the effect more noticeable.

Screenshot of Xcode sidebar

Xcode Project Bundle

First we load the files into the project as an AKAudioFile and a file url respectively as these are the paramaters AKConvolve() looks for.

let sourceFile = try? AKAudioFile(readFileName: "Sitting.wav", baseDir: .resources)
let urlOfIR = Bundle.main.url(forResource: "IR", withExtension: "wav")!

Next I created an AKAudioPlayer from my source file

let player = sourceFile?.player

I then hooked up the player node to the convolution process, attached the output of the convolution to the engine output (which will eventually play the audio) and started the engine.

AudioKit.output = convolvedOutput!
AudioKit.start()

Then we simply need to start both the convolution and player to hear the audio output.

convolvedOutput!.start()
player!.start()

The UI for now is as basic as it gets. There is a single button to toggle the convolution ON/OFF. In order to achieve this the turnOffConvolution() function is attached to the UIButton with simple if/else statements to toggle start and stop. I had to declare convolvedOutput as a global before initializing later for this function to access the start/stop functions. I’m sure I’ll change that when I update the app to use the MVC (Model-View-Controller) design, but for now it works well enough.

So there it is, a very basic app to hook up our input audio to the convolution node and output the result with a button to toggle processing on/off. Up next we’ll add the processing iterations so we can completely destroy the actual speech by the room resonances to mimic the original recording.

I am working on a project, one different than the one you are working on now…

This is one of those times when I actually try and follow through on random project ideas I post to Twitter. (I have a mixed record in that regard). I’ve been taking a course in Swift for iOS programming, and playing around with AudioKit to build some simple audio apps. I’d like to flex my programming chops a little more so I’ve decided to work on these projects. I’ll probably do the Lucier-inspired app first and follow it up with the Steve Reich ones since they’ll probably be a little more involved (though it’s quite likely I’m underestimating the workload here).

In order to keep myself accountable, and actually build something I’m not ashamed to show people, I’ll be writing up kind of ‘status reports’ on here. Sharing progress when I work on it, explaining how I handled some things and generally explaining how it works as I go. I’ll probably only start seriously working on it in the new year.

So here we go with a basic functional outline for the ‘I Am Sitting in a Room’ inspired project, aka ‘Project: iOS am convolving in a Room’

We are going to try and match the basic process used by Alvin Lucier to record the original piece (replacing the re-recording via analogue tape with convolution).

  • The user starts by either recording a short sample of audio or selecting one from their library. It’d be nice to use the original Lucier audio as a default also but I’d need to check if it’s public domain or otherwise get permission to use
  • The user records a simple impulse response either by clapping or turning the volume up and playing the impulse from the app through the speakers
  • The user selects either a total length or number of loops – Default will match the Lucier original
  • The user can play back the processed audio, trim it, and export the saved track
  • I’ll probably make use of Audiokit to capture the user inputs and use their AKConvolve node to do the processing, so most of the heavy lifting for me will be getting audio in and out of the app itself, although depending on the length of the input sample I may have to do some memory optimization or more likely limit the max input sample length.

    If you’re interested in following along, you’ll want an iOS device of some sort, and a Mac running Xcode 8. You’ll also want to download and install Audiokit into your Xcode project to access their functions.

    This should be fun and sure to keep the @sittinginaroom bot happy

How VR Audio works

Here’s a great post from Enda Bates of the Trinity 360 project, talking about 360 degree audio, which is the oft-forgotten half of the VR experience. Now that VR is going mainstream with Oculus Rift finally shipping and Samsung, HTC and Sony all releasing their own headsets to go along with cheaper alternatives like Google Cardboard, we are starting to see a shift towards better audio for VR. Companies like Google are focusing on spatial audio as one of the key components of the VR experience.

I’m lucky enough to get to work often with VR audio as part of my job, so it’s exciting to see it getting more of the attention it deserves as the VR market explodes. Enda’s post is a great rundown of how audio can be captured and rendered for VR and well worth checking out. I’m looking forward to catching the upcoming performance in April to see the result of the work that Enda and crew have been working towards.

Boring Blackalicious

I recently took the MIR Workshop at CCRMA Stanford (which I highly recommend) and got a chance to play around with python signal processing libraries including librosa. During the week one of the guest presenters used ‘Alphabet Aerobics’ by Blackalicious to demonstrate his source separation algorithm. This was a challenging piece of material because this track famously does not have a constant tempo and speeds up considerably throughout the song.

The thought struck me that it’d be way less interesting if the tempo was constant throughout, so this weekend I put my newly developed python processing skills to work and created ‘Boring Blackalicious’ – the constant tempo version of Alphabet Aerobics.

It uses librosa’s onset detection, beat tracking and tempo estimation functions to create a tempo map (with the help of some manual tweaking to correct the estimated tempos of the later segments, where the librosa functions had more difficulty keeping up).

The tempo map was used to calculate the correct rate to slow down each segment using librosa’s phase vocoder function. I used the phase vocoder over librosa.effect.time_stretch so I could tweak the fft length directly to get a better sounding result.

The python source is available here and you can listen to the original for comparison below!

Pages:123456»