There’s nothing on TV

Imagine this situation: You come home from work in the evening. You’re tired, so all you can manage is to watch a movie. You can’t find anything to watch on Netflix, so you turn on the new AI platform, which creates a movie tailored to your current needs. It chats with you for a while, finds out your mood and preferences, and after a moment, creates a movie just for you.

Do you like that idea? I quite do. However, according to Mark Zilberman, we will have to wait at least another ten years for this to happen.

In a published pre-print article, he claims that with current computing capacity, 100 graphics cards would be needed to generate 30 frames per second.

“Modern diffusion-based image models for high-resolution images (e.g., 1024×1024 px, 30–50 diffusion steps) require on the order of 50–100 TFLOPS to generate a single frame.

– For real-time video (30 fps, 3600 seconds) the total number of frames is 108,000.

– Naive estimate without optimizations: 100 TFLOPS × 30 fps ≈ 3,000 TFLOPS (3 PFLOPS) of sustained compute. This is equivalent to hundreds of top-tier GPUs such as the NVIDIA H100.”

I cannot myself verify whether Mark’s calculations are correct. When I consulted Perplexity, it found several errors in Mark’s argument. These were mainly: a) overestimation of the processing power of graphics cards, b) simplification of the video generation process, c) omission of the possibilities for optimizing the video generation process. However, the result is the same, and even according to Perplexity, home video generation (i.e., on a single graphics card) can be expected only in the second half of the 2030s.

If that day were today, I would have the fourth installment of Naked Gun with Leslie Nielsen generated.

Published by


Leave a comment