Real-time video processing is a special case of digital signal processing. Technologies like Virtual Reality (VR) and Augmented Reality (AR) strongly relate to real-time video processing to extract semantic information from each video frame and use it for object detection and tracking, face detection and other data sync techniques.  Real-time video processing of a mobile device is a rather complex task, due to the limited resources available on smartphones and tablets, but you can achieve great results when using the right techniques.
In this post I will show you how to process a real-time video using the metal frame that utilizes the power of the GPU. In one of our previous posts, you can check the details of how to install Metal rendering pipeline and process calculate shaders for image processing. Here we will do something similar, but this time we will process video frames.
Before continuing with the implementation of video editing in Metal, let's take a quick look at the AV Foundation framework and the components we need to play a video. In a previous post, I showed you how to use the AV Foundation to record video with your iPhone or iPad. Here we will use another set of AV basic classes to read and play a video file on an iOS or tvOS device.
You can play a video on iPhone or Apple TV in different ways, but for the purpose of this post, I will use the
AVPlayer class and 1
AVPlayer is a control object used to manage playback and timing of a media fund. You can use an
AVPlayer to play local and external file-based media, such as video and audio files. In addition to the default controls to be played, pause, change the playback speed and search for different times within the media timeline, an
AVPlayer object allows access to each frame of a video field through an
AVPlayerItemVideoOutput object. This object returns a reference to a Core Video Pixel Buffer (a ] type of
CVPixelBuffer ]. Once you get the pixel buffer, you can then convert it to a Metal texture and thus process it on the GPU.
AVPlayer is very simple. You can either use the file address of the video or an
AVPlayerItem object. So, to initialize an
AVPlayer use one of the following init methods:
init ( url URL : URL  ] ] : : AVPlayerItem )
AVPlayerItem stores a reference to an
AVAsset object representing the media to be played. A
AVAsset is an abstract, immutable class used to model time-based audiovisual media such as video and audio. Since
AVAsset is an abstract class, you can not use it directly. Instead, use one of the 2 subclasses that comes with the frame. You can choose from a
AVURLAsset and an
AVMutableComposition . A
AVURLAsset is a specific subclass of AVAsset that you can use for to a
AVComposition allows you to combine media data from multiple file-based sources in a custom temporary event or its rendered subclass
In this post I will use
AVURLAsset The following source code highlights how all of these AV basic classes can be combined:
// Get url to a local or external file.
la url = 19659012] = AVURLAsset (] [CreateanassetusingthespecifiedURL
la asset 19659011] url :  url )
/ / Create a Player Item
la playerItem = AVPlayerItem ( ] : active ]]
// Make a player using the player
la plays = ] playerItem ]: playerItem )
// Start Playing
Player ] Games ()
To extract the frames from video file while the player is playing, you must use an item
AVPlayerItemVideoOutput . When you get a video frame, you can use Metal to process it on the GPU. Let's now build a good example to demonstrate it.
Video Processor App
Create a new Xcode project. Select an iOS Single View application and name it VideoProcessor . Open ViewController.swift file and import
Since we need an
AVPlayer we add the following property to the display regulator:
la ] player : AVPlayer  = AVPlayer ()
As discussed above, the player gives access to each video frame through an
AVPlayerItemVideoOutput object. Then let's add an additional feature for the display control:
lat was playerItemVideoOutput : AVPlayerItemVideoOutput = ]
For this property,