قالب وردپرس درنا توس
Home / IOS Development / Photo Stacking in IOS with Vision and Metal

Photo Stacking in IOS with Vision and Metal



What is Photo Stacking? Well, imagine this. You're on vacation, somewhere magical. You are traveling around the UK and visiting all the Harry Potter movies!

It's time to see the sites and take the most amazing photos. Where else do you rub it in your friends' faces that you were there? There is only one problem: There are so many people. : [

Ugh! Each picture you take is full of them. If only you could cast a simple spelling, like Harry, and make all these people disappear. Evanesco! And poof! They are gone. It would be great. It would be the [a] st. ;]

Maybe there is something you can do. Photo Stacking is an emerging calculation photography trend that all the cool kids are talking about. Want to know how to use this?

In this tutorial, you use the Vision framework to learn:

  • Adjust capture images using a VNTranslationalImageRegistrationRequest .
  • Create a custom CIF filter using a metal core.
  • Use this filter to combine multiple images to remove any moving objects.

Exciting, right? Well, what are you waiting for? Read on!

Getting Started

Click the button Download materials at the top or bottom of this tutorial. Open the startup project and run it on your device.

Note : Since you must use the camera and Metal for this tutorial, you must run it on an actual device and not the simulator.

Evanesco startup screen

You should see something it looks like a simple camera app. There is a red record button with a white ring around it and it shows the camera's input screen.

Surely you've noticed that the camera seems a bit shaky. That's because it is set to record at five frames per second. To see where this is defined in code, open CameraViewController.swift and find the following two lines in configureCaptureSession () :

  camera.activeVideoMaxFrameDuration = CMTime (value: 1, time scale : 5)
camera.activeVideoMinFrameDuration = CMTime (value: 1, time scale: 5)

The first line forces maximum frame rate to be five frames per second. The second line defines the minimum frame as the same. The two lines together require the camera to run to the desired frame frequency.

If you press the record button, you should see the outer white ring fill the clock. But when it's done, nothing happens.

You have to do something about it right now.

Save pictures to the files App

To help you troubleshoot the app while you go, it would be nice to save the photos you are working on to the Files app. Fortunately, this is much easier than it sounds.

Add the following two keys to Info.plist :

  1. The program supports iTunes file sharing.
  2. Supports opening documents in place.

] Set both values ​​to YES . When done, the file should look like this:

 Info.plist example

The first key makes it possible to share files in the directory Documents . The other allows the app to open the original document from a file provider instead of receiving a copy. When both of these options are enabled, all files stored in the app's directory are displayed Documents in the app Files . This also means that other apps can access these files.

Now that you've given Files app permission to access the directory Documents it's time to save some pictures there.

Along with the startup project, a helper is struct called ImageSaver . When instantiated, it generates a universal unique identifier (UUID) and uses it to create a directory in the catalog Documents . This is to ensure that you do not overwrite previously saved images. You use ImageSaver in your app to write your photos to files.

In CameraViewController.swift you define a new variable at the top of the class as follows:

 ] was saver: ImageSaver?

Then scroll to recordTapped (_ :) and add the following to the end of the method:

  saver = ImageSaver ()

Here you create a new ImageSaver every time the recording button is tapped, ensuring that each recording session stores the images to a new directory.

Then scroll to captureOutput (_: didOutput: from :) and add the following code after the first if sentence:

  // 1
guard
la imageBuffer = CMSampleBufferGetImageBuffer (sampleBuffer),
la cgImage = CIImage (cvImageBuffer: imageBuffer) .cgImage ()
otherwise {
return
}
// 2
let the image = CIImage (cgImage: cgImage)
// 3
saver? -write (photo)

With this code you are:

  1. Extract CVImageBuffer from the captured sample buffer and convert it to a CGImage .
  2. Convert CGImage ] to a CIImage .
  3. Write the picture to the catalog Documents .

Note : Why did you need to convert the sample buffer to a CIImage then to a CGImage and finally back to a CIImage again? This has to do with who owns the data. When you convert the sample buffer to a CIImage the image stores a strong reference to the sample buffer. Unfortunately, for video recording, this means that after a few seconds, it will start to drop frames because of the lack of memory allocated to the sample buffer. By making CIImage to a CGImage using a CIIContext make a copy of the image data and the sample buffer can be released to be used again. 19659002] Now, build and run the app. Press the record button and when finished, switch to the app Files . Under the Evanesco folder you should see a UUID named folder with 20 items in it.

UUID named folder

If you look in this folder, you will find the 20 pictures you captured during 4 seconds of recording.

Capture Frames

Note : If you do not see the folder right away, use the search box at the top of the app Files .

OK, cool. So what can you do with 20 almost identical pictures?

Photo Stacking

In computing photography, photo stacking is a technique where multiple images are taken, adjusted and combined to create different desired effects.

For example, HDR images are obtained by taking multiple images at different exposure levels and combining the best parts of each. It is so that you can see detail in shadows as well as in the bright sky at the same time in iOS.

Astro photography also makes great use of photo stacking. The shorter the image exposure, the less noise is picked up by the sensor. So the astrophotographers usually take a lot of short exposure pictures and stack them together to increase the brightness.

In macro photography, it is difficult to get the whole picture in focus at the same time. With photo stacking, the photographer can take some pictures with different focal lengths and combine them to give an extremely sharp image of a very small object.

To combine the images together, you must first adjust them. How? IOS provides some interesting APIs that can help you with that.

Using Vision to Adjust Images

Frame Vision has two different APIs to customize images: VNTranslationalImageRegistrationRequest and ] VNHomographicImageRegistrationRequest ] VNHomographicImageRegistrationRequest . The former is easier to use, and if you assume the user of the app will keep the iPhone relatively quiet, it should be good enough.

To make your code more readable, you will create a new class to handle

Create a new, empty Swift file and name it ImageProcessor.swift .

Remove any specified import ads and add the following code:

  import CoreImage
import vision

class ImageProcessor {
var frameBuffer: [CIImage] = []
was adjustedFrameBuffer: [CIImage] = []
was finished: ((CIImage) -> Fat)?
var isProcessingFrames = false

var frameCount: Int {
return frameBuffer.count
}
}

Here you import the Vision framework and define the class ImageProcessor along with some necessary features:

  • frameBuffer stores the original captures.
  • adjustedFrameBuffer will contain the images after they have been adjusted.
  • completion is a handler that will be called after the images have been adjusted and combined.
  • isProcessingFrames will indicate whether images are currently being adjusted and combined.
  • frameCount is the number of pictures taken.

Then add the following method in ImageProcessor class:

  func add (_ frame: CIImage) {
if isProcessingFrames {
return
}
frameBuffer.append (frame)
}

This method adds a captured frame to the frame buffer, but only if you are not currently processing the frames in the frame buffer.

Continue in class, add the treatment method:

  func processFrames (completion: ((CIImage) -> Feid)?) {
// 1
isProcessingFrames = true
self.completion = completion
// 2
la firstFrame = frameBuffer.removeFirst ()
alignedFrameBuffer.append (firstFrame)
// 3
for frame in frameBuffer {
// 4
la request = VNTranslationalImageRegistrationRequest (targetedCIImage: frame)

do {
// 5
la sequenceHandler = VNSequenceRequestHandler ()
// 6
try sequenceHandler.perform ([request] on: firstFrame)
} catch {
print (error.localizedDescription)
}
// 7
alignImages (request: request, frame: frame)
}
// 8
clean up()
}

It seems like many steps, but this method is relatively simple. You will call this method after you have added all the captured frames. It will treat each frame and adjust them using the Vision framework. Specifically, in this code, you are:

  1. Set isProcessingFrames Boolean variable to prevent adding multiple frames. You also save the finisher for later.
  2. Remove the first frame from the frame buffer and add it to the frame buffer for adjusted images.
  3. Loop through each frame in the frame buffer.
  4. Use the framework to create a new Vision request to determine a simple translation adjustment.
  5. Create the sequencer manager, who will handle your adjustment requests.
  6. Perform the Vision request to adjust the frame to the first frame and make a mistake.
  7. call alignImages (request: frame :) with the request and current frame. This method does not yet exist and you will fix it soon.
  8. Cleanup. This method still needs to be written.

Ready to tackle alignImages (request: frame :) ?

Add the following code just below processFrame (completion :) :

  func alignImages (request: VNRequest, frame: CIImage) {
// 1
guard
la result = request.results as? [VNImageTranslationAlignmentObservation],
let the result = results.first
otherwise {
return
}
// 2
let alignedFrame = frame.transformed (by: result.alignmentTransform)
// 3
alignedFrameBuffer.append (alignedFrame)
}

Here:

  1. Cancel the first result of the adjustment request you made in for loop in process files (completion :) .
  2. Transform the frame using the affine transformation matrix calculated by the Vision framework.
  3. Add this translated frame to the adjusted frame buffer.

These last two methods are the meat of the Vision code your app needs. You perform the requests and then use the results to change the images. Now all that is left is to clean up after yourself.

Add this following method to the end of ImageProcessor class:

  func cleanup () {
frameBuffer = []
alignedFrameBuffer = []
isProcessingFrames = false
completion = zero
}

In cleanup () just delete the two frame buffers, reset the flag to indicate that you no longer process frames, and set the finisher to null .

Before you can build and run your app, use ImageProcessor in CameraViewController .

Open CameraViewController.swift . At the top of the class, define the following property:

  la imageProcessor = ImageProcessor ()

Then you find captureOutput (_: didOutput: from :) . You will make two small changes to this method.

Add the following line just below let image = ... line:

  imageProcessor.add (image)

and during the call to stopRecording () continued in if the statement, add:

  imageProcessor.processFrames (completion: displayCombinedImage)

Build and run your app and ... nothing happens. No worries, Mr. Potter. You still need to combine all of these images into a single masterpiece. To see how you do it, read on!

NOTE : To see how your adjusted images compare to the original captures, you can create a ImageSaver in ImageProcessor . This will allow you to save the adjusted images in the Documents folder and view them in the File app.

How Photo Stacking Works

There are several different ways to combine or stack images together. The simplest method is undoubtedly only to the average pixels of each location in the image together.

For example, if you have 20 pictures to stack, you will average the pixel on coordinate (13, 37) across all 20 images to get the average pixel value of the stacked image (13, 37). [19659104] Pixel stacking

If you do this for each dot coordinate, your final image will be the average of all the images. The more images you have, the closer the average will be to the background pixel values. If something moves in front of the camera, it will only appear in the same location in a few pictures, so it will not contribute much to the average. Therefore, moving objects disappear.

How to implement the stacking logic.

Stacking Images

Now comes the very funny part! You should combine all of these images into a single stunning image. You must create your own Core Image kernel using Metal Shading Language (MSL) .

Your simple kernel will calculate a weighted average of the pixel values ​​for two images. When you average a lot of pictures together, some moving objects should just disappear. The background pixels will appear more often and dominate the average pixel value.

Creating a kernel kernel kernel

You start with the actual kernel, which is written in MSL. MSL is very similar to C ++.

Add a new Metal file to your project and mention it AverageStacking.metal . Enter the template code and add the following code to the end of the file:

  #include 

external "C" {namespace coreimage {
// 1
float4 avgStacking (sample_t currentStack, sample_t newImage, float stackCount) {
// 2
float4 avg = ((currentStack * stackCount) + newImage) / (stackCount + 1.0);
// 3
avg = float4 (avg.rgb, 1);
// 4
return avg;
}
}}

With this code you are:

  1. Define a new function called avgStacking which returns a series of 4 flow values, representing the pixel colors red, green and blue and an alpha channel. The feature is used on two images at a time, so keep an eye on the current average of all the images seen. currentStack parameter represents this mean, while stackCount is a number indicating how the images were used to create currentStack .
  2. Calculate the weighted average of the two images. Since currentStack can already include information from multiple images, multiply it by stackCount to give it the correct weight.
  3. Add an alpha value to the average to make it completely impenetrable.
  4. Return the average pixel value.

Note : It is very important to understand that this feature will be called once for each pair of corresponding pixels between the two images. The sample_t data type is a pixel sample from an image.

OK, now that you have a core function, you need to create a CIFilter to use it! Add a new Swift File to your project and mark it AverageStackingFilter.swift . Remove the import task and add the following:

  import CoreImage

class AverageStackingFilter: CIFter {
la core: CIBlendKernel
was inputCurrentStack: CIImage?
was inputNewImage: CIImage?
was inputStackCount = 1.0
}

Here you define your new CIFilter class and some features you need for it. Notice how the three input variables match the three parameters of your core. Coincidence? ;]

At this point, Xcode probably complains that this class is missing an initializer. So, time to fix it. Add the following to the class:

  override init () {
// 1
watch la url = Bundle.main.url (for Resource: "default",
withExtension: "metallib") otherwise {
fatalError ("Check your build settings.")
}
do {
// 2
leave data = try data (content of: url)
// 3
core = try CIBlendKernel (
function name: "avgStacking",
fromMetalLibraryData: data)
} catch {
print (error.localizedDescription)
fatalError ("Make sure feature names match")
}
// 4
super.init ()
}

With this initialization you get:

  1. Get the URL of the compiled and linked metal file.
  2. Read the contents of the file.
  3. Try to make a CIBlendKernel from avgStacking function in the metal file and panic if it fails.
  4. Call on super init .

Please wait ... when compiled and link your metal file? Unfortunately, you have not yet. The good news is that you can get Xcode to do it for you!

Compile your kernel

To compile and link your metal file, you must add two flags in Build Settings .

  • Search for Other Metal Compiler Flagger and add -fcikernel to it:

    Metal compiler flag

    Then click + and select Add user-defined setting :

    Add user-defined setting

    Call the setting MTLLINKER_FLAGS and set it to -cikernel :

    Metal link flag [19659014] Now the next time you build your project, Xcode will compile your metal files and plug them in automatically.

    Before you can do this, you still have some work to do on your Core Image filter.

    Back in AverageStackingFilter.swift add the following method:

      func outputImage () -> CIImage? {
    guard
    let inputCurrentStack = inputCurrentStack,
    let inputNewImage = inputNewImage
    otherwise {
    return zero
    }
    return kernel.apply (
    scope: inputCurrentStack.extent,
    arguments: [inputCurrentStack, inputNewImage, inputStackCount])
    }
    

    This method is quite important. Namely, it will use the core function of the input images and return the output image! It would be a useless filter if it didn't.

    Ugh, Xcode still complains! Nice. Add the following code to the class to reassure it:

      required init? (Codes aDecoder: NSCoder) {
    fatalError ("init (coder 🙂 has not been implemented")
    }
    

    You do not need to be able to initialize this Core Image filter from an unarchiver, so you will only implement the smallest minimum to make Xcode happy.

    Using your filter

    Open ImageProcessor. swift and add the following method to ImageProcessor :

      func combineFrames () {
    // 1
    was finalImage = adjustedFrameBuffer.removeFirst ()
    // 2
    la filter = AverageStackingFilter ()
    // 3
    for (i, the picture) in alignedFrameBuffer.enumerated () {
    // 4
    filter.inputCurrentStack = finalImage
    filter.inputNewImage = image
    filter.inputStackCount = Double (i + 1)
    // 5
    finalImage = filter.outputImage ()!
    }
    // 6
    cleanup (image: finalImage)
    }
    

    Here:

    1. Initialize the final image with the first one in the adjusted frame buffer and remove it in the process.
    2. Initialize your custom Core Image filter.
    3. Review each of the remaining images in the adjusted frame buffer.
    4. Set up the filter parameters. Please note that the final image is set as the current stack image. It is important not to change the input images! The stack count is also set to the array index plus one. This is because you removed the first image from the adjusted frame buffer at the beginning of the method.
    5. Overwrite the final image with the new filter output image.
    6. Call cleanup (image :) with the final image after all the pictures are combined.

    You may have noticed that cleanup () does not take any parameters. Fix it by replacing cleanup () with the following:

      func cleanup (image: CIImage) {
    frameBuffer = []
    alignedFrameBuffer = []
    isProcessingFrames = false
    if let completion = completion {
    DispatchQueue.main.async {
    completion (picture)
    }
    }
    completion = zero
    }
    

    The only changes are the newly added parameter and if the statement that calls the completion handler on the main thread. The rest remains as it was.

    At the bottom of process frames (completion :) replace the call to cleanup () with:

      combFrames ()
    

    This way, the image processor will combine all the framed frames after it has been adjusted and then forward the final image to the completion function.

    Phew! Build and run this app and do the people, cars and everything that moves in your delay attempt!

    And poof! The cars disappear!

    For more fun, bend a blackboard and shout Evanesco! while using the app. Other people will definitely not think you are nice. :]

    Where to go from here?

    Congratulations! You have done so through many concepts in this tutorial. You are now ready to work with your magic in the real world!

    However, if you want to try to improve your app, there are a few ways to do it:

    1. Use VNHomographicImageRegistrationRequest to calculate the perspective creep matrix to adjust the captured frames. This should create a better match between two frames, it is just a little more complicated to use.
    2. Calculate the mode pixel value instead of the average. The mode is the most common value. Doing so will remove all the effects of moving objects from the image as they are not averaged in. This should create a cleaner appearance image. Hint: Convert RGB to HSL and calculate the mode based on small areas of the hue (H) value.

    If you are interested in more information about Metal, check out Metal Tutorial: Getting Started and Book of Metal by Tutorials.

    We hope you enjoyed this tutorial, and if you have any questions or comments, please join the forum discussion below!


    Source link