May 27, 2024, Mobile

RealityKit Object Capture – Create Custom 3D Models for visionOS

iteo team
reality kit object capture
Film and virtual reality are only two examples of the many media contexts where three-dimensional models find application. But producing 3D assets by hand can frequently take a lot of time, require several artists for texturing and modeling, and end up looking more like an artist's interpretation of reality than the real thing. Models that are objectively faithful to reality are valuable in some media contexts, such as science communication and cultural heritage protection. This sets the scene for why photogrammetry is such an important step.

What is photogrammetry?

The method of creating a 3D model from a collection of images is called photogrammetry. The two main stages of the procedure are image capture and reconstruction. 

Using a high-quality camera, consistent, diffuse lighting, little environmental disruption, and taking pictures of a subject from many perspectives are all necessary for a professionally done photo shoot. Planning the technique of obtaining the imagery is also crucial to guaranteeing that shots are taken equally and densely around the subject because every subject is unique. 

Using external software, the reconstruction procedure analyzes the photos and finds common points in each one. These points function as reference markers, and the software makes use of them to determine the separation and angle between various image parts in order to create a 3D model.  

I used the Object Capture API in RealityKit, in iOS 17 and later, as well as macOS 12 and later, it is enough that you give RealityKit Object Capture a collection of brightly illuminated images that were captured from various perspectives and the magic will begin. 

Is object capture essentially based on the following technical processes: “Scale-invariant feature transform (SIFT)”, “Random sample consensus (RANSAC)”, “Structure from Motion (SfM)”, “Bundle adjustment” and / or “Multi-View Stereo (MVS)”? 

Regretfully, the technical procedures that are internally employed by Object Capture for 3D reconstruction are not disclosed in the official Apple documentation. It is merely stated that photogrammetry’s general technique is employed.  

The Pear Project

First and foremost, we must establish a setting that minimizes outside distractions. Best practices recommend minimizing accidental shadows and attempting to eliminate as many backdrop pictures that are not objects as you can in order to achieve the best outcomes. Although Apple does a fantastic job of eliminating as many of these artifacts as possible, given how new the technology is, it isn’t perfect yet. I merely got some supplies and built my own lightbox. I realize that it appears awful. However, there is room for improvement. 

the pear project 1

Photogrammetric surveys of the site were carried out using an iPhone 15 Pro Max smartphone, which is an inexpensive device capable of acquiring images. 

Key camera specifications: 

  • Primary: 48MP sensor, 2.44µm quad pixels, 24mm equivalent f/1.78-aperture lens, Dual Pixel AF, OIS

  • Ultra-wide: 12MP sensor, 13mm equivalent, f/2.2-aperture lens, Dual Pixel AF

  • Tele: 12MP sensor, 1.0µm pixels, 77mm equivalent f/2.8-aperture lens

200 images were taken from a distance of approximately 20 cm and using the same focus. 

To accurately record all of the object’s characteristics, a single survey session was carried out during which it was rotated about 200 times. The photos were captured in a convergent configuration during each session. In order to make the process of mask definition easier, a white panel was utilized as the background. In fact, the software’s boundary detection tool was able to be used effectively in this fashion because of the great contrast between the object and the background, which made it possible to rapidly and accurately create the polygon of the area of interest in the image. 

the pear project 2

Now we run the software, which does all the work for us. 

Xcode - preparing the generator


  • Apple Silicon MacBook or an Intel MacBook with 4GB AMD and 16GB RAM.

  • Xcode 13 or later.

  • Photogrammetry Command-Line App – This sample HelloPhotogrammetry app.

Tip: Remember to “Signing and capabilities” > “Select your apple dev account cert”  

We can now open the HelloPhotogrammetry app, and you may select how to configure it. You have the following options to choose from: 

  • Terminal

  • Setup main() function

  • Launch Arguments in Xcode schemes

I’ll demonstrate how I used option 2. Launch the file main.swift, these lines of code should be visible if you scroll to the bottom. 

if #available(macOS 12.0, *) { 
} else { 
    fatalError("Requires minimum macOS 12.0!") 

You will pass in an array of strings containing the parameters required to run the application successfully inside the HelloPhotogrammetry.main() function. We shall be sending in the following string arguments: 

Input folder: This is the path to the pictures folder. 

Output folder: This is the path to a USDZ output file for the 3D model. 

Detail: (preview, reduced, medium, full, raw) The detail of the output model in terms of mesh size and texture size. (default: nil) 

Sample ordering: (unordered, sequential) Setting to sequential may speed up computation if images are captured in a spatially sequential pattern.  

Feature sensitivity: (normal, high) Set to high if the scanned object does not contain a lot of discernible structures, edges, or textures. 

After you’ve been familiar with our configuration choices, you should adjust them to suit your requirements.

While I was working with my subject, I used the following settings: 

if #available(macOS 12.0, *) { 
} else { 
    fatalError("Requires minimum macOS 12.0!") 

Now that the application is operational, your photos will start to magically transform into a 3D model. Depending on your MacBook’s specs, the amount of photos you took, and the precise information you put, this could take a few minutes. 

Following a successful operation, we now have a USDZ file with our 3D model and every texture. A brand-new, cutting-edge format called USDZ was created to facilitate the exchange of 3D assets among numerous platforms and gadgets. A compressed 3D file format called USDZ, or Universal Scene Description Zip, was created by Apple and Pixar together. 


Spectrum of immersion

With Apple Vision Pro, you can play, explore, and experiment endlessly, allowing you to totally reimagine what 3D can be. You can use the app to engage while remaining aware of your surroundings or to fully submerge yourself in the virtual world you’ve built. If so, let’s take a closer look at our subject with Vision Pro. 

the pear in vision pro

To view and interact with the object in Vision Pro, use the code below.  

import SwiftUI 
import RealityKit 
import RealityKitContent 
struct Local3dModelScreen: View { 
    @State var rotation: Angle =  Angle(degrees: 0) 
    @State var rotationAxis: (x: CGFloat, y: CGFloat, z: CGFloat) = (x: 0, y: 0, z: 0) 
    var body: some View { 
        VStack { 
            Text("Local 3D Model") 
                .font(.system(size: 50, 
                              design: .rounded)) 
                named: " pear", 
                bundle: realityKitContentBundle 
            .padding(.bottom, 50) 
            .rotation3DEffect(rotation, axis: rotationAxis) 
                    .onChanged { value in 
                        let angle = sqrt(pow(value.translation.width, 2) + pow(value.translation.height, 2)) 
                        rotation = Angle(degrees: Double(angle)) 
                        let axisX = -value.translation.height / CGFloat(angle) 
                        let axisY = value.translation.width / CGFloat(angle) 
                        rotationAxis = (x: axisX, y: axisY, z: 0) 
Grucha from

That’s all for now, and you won’t understand the history of this sweater anyway … 


In the past, 3D data production was a very specialized field that called for specific tools. With the help of Object Capture API (macOS), which was unveiled at WWDC21, it is now easier to create 3D data using photogrammetry.