Mar 18, 2024, Mobile

Sound Analysis on Apple Platforms

iteo team
oneline woman with a 3d apple
If you have ever explored the accessibility features of your iPhone, you are likely familiar with Sound Recognition and its ability to detect various audio cues, such as smoke alarm beeps. What if I told you that the fundamentals of how it works are quite simple to implement?

Apple Documentation to the Rescue

Apple provides sample code along with the SoundAnalysis framework that should get you up to speed. It basically comes down to:

  1. Creating and configuring AVAudioEngine

  2. Loading your (or Apple-provided) sound analysis model

  3. Submitting a SNClassifySoundRequest

  4. Observing and handling the results using SNResultsObserving

See? It’s really simple! Apple provides a built-in, really extensive model that should be enough for detecting the general type of sound.

But what if we wanted to use our own model, for example to detect humming?

Meet CreateML

Apple CreateML is a tool that allows developers to create and train machine learning models specifically tailored to their needs. With CreateML, we can easily build models for sound analysis, image recognition, text classification, and more. This means that we have the flexibility to train our model to recognize specific sounds, rather than relying on a pre-built, generic model provided by Apple.

When we launch Create ML, we will be greeted with a screen asking us to choose a template. For example, if we want to recognize sounds, we should use Sound Classification:

Create ML

After selecting a template and naming it, we will see a fairly empty-looking screen. For now, let’s only focus on the Data section. If we are unsure of what we are doing, it is better to leave the parameters at their default settings:

Sound Classification

Grabbing Samples

In order to train our own sound analysis model using CreateML, we will need to grab samples of the sound we want to detect. This can be done by recording various instances of the sound we are trying to classify, such as humming, talking or breathing. These samples will serve as the training data for our model, allowing it to learn how to differentiate between different sound patterns. By gathering a diverse range of samples, we can ensure that our model is robust and accurate in its classifications.

To train our model, we will need to grab some samples, such as:

  • Breathing

  • Coughing

  • Harrumphs

  • Humming

  • Silence

  • Talking

Capture only the actual sound! For example, if we want to detect humming, capture just the humming part. It’s also best to capture all the samples in the same format, sample rate, etc.

When we’re done grabbing samples, we should store them in a structure that Create ML will recognize: one root folder (i.e., Data), with subfolders containing the various sound types (i.e., Breathing, Humming, etc. – what we name them will be their classification name):

sound classification

With our samples structured appropriately, we can import them to Training Data in our .mlproject file – select the root folder (i.e. Data) and press Open. If we did everything correctly, we should see Training Data section get populated with our samples:

training data

After importing our samples, we can start training the model – press the Train button in the toolbar and wait for it to finish:

my sound classifier

To ensure that our model works correctly, go to the Preview tab. Then, either choose Live Preview or add some audio files to test it out:

preview tab

Everything ready? If so, it’s time to export our model! Switch to the Output tab, press Get and select the output directory:

my sound classifier

How to Use Custom Models

Now that we have trained our own model, we can finally use it in our app! To do so, add the exported .mlmodel file to the Xcode project and use it when constructing SNClassifySoundRequest:

// Before:
let version1 = SNClassifierIdentifier.version1
let request = try SNClassifySoundRequest(classifierIdentifier: version1)

// After:
let defaultConfig = MLModelConfiguration()
let soundClassifier = try MySoundClassifier(configuration: defaultConfig)
let classifySoundRequest = try SNClassifySoundRequest(mlModel: soundClassifier.model)

How does it look when it’s all put together?

// Make sure to request microphone access
func ensureMicrophoneAccess() async -> Bool {
    #if os(iOS) || os(macOS)
    let authorizationStatus = AVCaptureDevice.authorizationStatus(for: .audio)
    switch authorizationStatus {
    case .notDetermined:
        let success = await AVCaptureDevice.requestAccess(for: .audio)
        return success
    case .denied, .restricted:
        return false
    case .authorized:
        return true
    @unknown default:
        return false
    }
    #elseif os(watchOS)
    return await AVAudioApplication.requestRecordPermission()
    #endif
}
// Feel free to reuse `audioEngine`, `streamAnalyzer` and `soundClassifier` :)
func startRecognizing() throws {
    let audioEngine = AVAudioEngine()
    self.audioEngine = audioEngine
    let inputBus = AVAudioNodeBus(0)    // Default input device
    let inputFormat = audioEngine.inputNode.inputFormat(forBus: inputBus)
    let audioBufferSize: AVAudioFrameCount = 8192
    audioEngine.inputNode.installTap(
        onBus: inputBus,
        bufferSize: audioBufferSize,
        format: inputFormat,
        block: analyzeAudioBuffer
    )
    try audioEngine.start()
    let streamAnalyzer = SNAudioStreamAnalyzer(format: inputFormat)
    self.streamAnalyzer = streamAnalyzer
    let defaultConfig = MLModelConfiguration()
    let soundClassifier = try SoundClassifierModel(configuration: defaultConfig)
    let classifySoundRequest = try SNClassifySoundRequest(mlModel: soundClassifier.model)
    try streamAnalyzer.add(classifySoundRequest, withObserver: self)    // `self` being our `SNResultsObserving`
}
func analyzeAudioBuffer(_ buffer: AVAudioBuffer, at time: AVAudioTime) {
    // let analysisQueue = DispatchQueue(label: "\(Bundle.main.bundleIdentifier ?? "").AnalysisQueue")
    analysisQueue.async {
        guard let streamAnalyzer = self.streamAnalyzer else { return }
        streamAnalyzer.analyze(buffer, atAudioFramePosition: time.sampleTime)
    }
}

The same code will also run on other platforms, such as macOS or watchOS! To use it in multiple targets, we can split our codebase into three parts:

  • SoundClassification: iOS/iPadOS/macOS target

  • SoundClassification Watch App: watchOS target

  • Shared: everything that will be shared by both of our targets, for example our AudioRecognizer class or the .mlmodel file. Just remember to enable Target Membership for all targets 🙂

sound classification

Make sure to add NSMicrophoneUsageDescription to Info.plist of every target that will use the microphone; otherwise, the app will crash :/

Summing It Up

In this article, we explored how to implement sound analysis on Apple platforms using SoundAnalysis and Create ML. By following Apple’s documentation and utilizing Create ML to train our own sound analysis model, we can, for example, add the ability to detect specific sounds in our app. By gathering samples, training the model, and exporting it for use in our app, we can create a robust and accurate sound analysis system. The same code can be applied across various Apple platforms, providing developers with the tools to create unique and innovative applications.

How does it all look when it’s put together?

app visualization gif
Share