Video Up-scaling with Machine Learning

4 min readMay 25, 2021

The technology behind film-making has progressed at a rapid pace since its inception. Without fail, consumers have seen time and time again advances in convenience and quality. With that being said, every step forward makes works of the past look more and more antiquated as time goes on. How can a filmmaker make sure, to the best of their ability, that their work will not be brushed aside because of technical limitations some years into the future? And just as important, how can they do it without a significant time and financial investment? Through machine learning!

The issue comes in two parts. First, for analog films. And second, for digital. When it comes to analog films the major bottlenecks are the quality of the film and the preservation efforts. Film degrades, so making sure it is kept in a cool, dry location is paramount for archival purposes. And furthermore, it is a much more laborious process to develop and scan the film.

Digital is more convenient but not as flexible. Once a film is shot digitally, it’s native resolution is set in stone unlike film which can be scanned to an extremely high resolution before reaching its max. However, to archive a digital film, it requires only virtual space on a computer. Much easier than having special film refrigerators.

So what happens if for example, a studio LOST the master copy of a film, if they filmed digitally at a low resolution, or just plain do not want to remaster a release? A machine learning program can step in to help fill in the missing pieces.

Let’s first look at an example with several up-scaling methods.

An example of up scaling from a DIGITAL source

Not all of the following methods are actually using machine learning. And more importantly, not all of them are using video up-scaling. Bi-cubic interpolation is a real-time process and Photoshop is a MANUAL processes. And letsenhance.io and Mail.ru Vision are specifically meant for photographs. Only waifu2x uses machine learning to to enhance the video, with the difference being it is trained for hand-drawn content and it uses previous frames as reference to figure out how the image should look like.

How does Waifu2x work?

It uses “Deep Convolutional Neural Networks” which is most often used for processing images. And the more data it is fed the more accurate it will be. It has a more specialized use case as it is trained using 2D animation.

Regarding how to OBJECTIVELY measure the performance of the software, the process is as follows and comes from SF Video Technology’s YouTube page.

Choose a 1080p video to serve as your control
Scale the video down to 360p
Scale it up to 720p and test how much visual data was lost

In “SF Video Technology’s” test there was approximately a 2% difference as compared to the original file using a VMAF test. This specific test is a perceptual video quality assessment algorithm created by Netflix.

Limitations

While the results are impressive, it comes at a steep cost. Computationally the process of using ML to upscale a video is extremely expensive. To upscale a 24 minute 480p video at 30fps would take 16 hours…on an AWS EC2 G series with an Nvidia t4 Tensor core GPU. The process would cost around $16-$30 per episode depending on the input and output. This could very well be ideal, however, as the standard remaster process is quite a more laborious process that relies heavily on the expertise of who is in charge of the remaster.

Great example of the power of ML up-scaling combined with manual retouches.

https://www.youtube.com/c/MJJ4K/videos

Video Up-scaling with Machine Learning

Written by Nate Tsegaw

No responses yet