Disney and OpenAI Signal the Arrival of AI Video Streaming

I recently watched the earliest surviving motion picture. Roundhay Garden Scene, which dates back to 1888. Four figures, two men and two women, walk around the yard with quick, jerky steps. This lasts about two seconds.

I also recently watched some videos made in 2016 by researchers at the Massachusetts Institute of Technology and the University of Maryland, who are among the first videos created entirely by artificial intelligence. Each lasts about a second. In one, a blurred figure stands on a golf course, bending at the waist to hit a putt. No one will confuse these videos or Roundhay Garden Scene for dexterity realism of modern cinema. And just as skeptics often deride AI videos as wasteful, 19th-century critics dismissed early cinema as “stupid curiosity

However, a recent agreement between Disney and OpenAI offers a glimpse into a different future. Since the beginning of 2026, the technology company video generator Sora will be able to create videos featuring over 200 characters from Disney, Marvel, Pixar and Star wars franchise. And Disney+ will stream a selection of user-generated clips.


About supporting science journalism

If you enjoyed this article, please consider supporting our award-winning journalism. subscription. By purchasing a subscription, you help ensure a future of influential stories about the discoveries and ideas shaping our world today.


Disney will also invest $1 billion in OpenAI and will use its tools to create “new experiences for Disney+ subscribers,” according to the release. Joint press release from Disney and OpenAI. In announcing the partnership, Disney CEO Robert Iger said the company will “thoughtfully and responsibly expand our storytelling capabilities through generative artificial intelligence.” On a recent earnings call, he also said he intends for subscribers to create content. inside Disney+ itself. If you want to watch Elsa and Cinderella destroy Maleficent, you'll be able to request a scene, although it may only last 20 seconds.

If this is the start of AI TV on demand, I wonder how long it will be before these clips reach 20 minutes or an hour, given the environmental burden and computer costs. Many people believe that it is impossible, but I think that few of those who looked Roundhay Garden Scene foresaw The Great Train Robbery, 12-minute silent film milestone of 1903, not to mention Gone with the Wind— or streaming.

The problem with creating images is how modern systems work. They are built on diffusion, a technique that starts with “noise” that gradually turns into an image. Imagine a person standing in the fog. Essentially, the AI ​​removes the fog and adds new pixels in repeated passes until a coherent shape emerges. Each pass to refine the generated image increases the cost.

The video is even more difficult. The series of images must be coordinated so that the facial features do not change and the coffee mugs do not disappear. Millions of pixels change in one second of high-definition video. During keynote speech At a hackathon hosted by AI community center AGI House, Bill Peebles, an OpenAI researcher who helped develop Sora, said: “We discovered how painful it is to work with video data. These videos have a lot of pixels.”

To control pixels, OpenAI system compresses the video into a simplified version that retains important information. He then treats it like a loaf of bread – cutting it into frames and then dividing it into cubes. This allows the model to coordinate all the cubes with each other, similar to how ChatGPT-enabled models associate all the words in a response.

Going from seconds to minutes is so tedious because the more frames you add, the more information the model has to take into account. As the videos get longer, the inconsistencies accumulate. True on-demand AI TV will also require trimming between scenes. If every Disney+ user asked for this using the latest technology, the costs would be staggering.

Researchers have been looking for more effective approaches. One is for the model to break the work into stages. “Instead of denoising or generating the whole video at once, you generate it frame by frame,” says Tianwei Yin, a researcher at AI image editing startup Reve who co-authored the design. Video creation software CausVid. “At each step, your calculations are limited to a much smaller part rather than the full part, and this allows you to work much longer.”

Yin believes that by next year systems will more efficiently achieve five minutes of generation, and by integrating various existing artificial intelligence technologies, they will be able to reach an hour soon after that. Others echoed this optimism. In recent BBC interviewGoogle CEO Sundar Pichai spoke about the possibility of creating feature-length films using artificial intelligence for high school students in the coming years. Cristobal Valenzuela, CEO of synthetic video company Runway, said: Country earlier this month: “Having 60 or 90 minutes with consistent characters and plot is still impossible. But it will happen soon.” He went on to say that the ability to view AI videos as they are generated in real time will also be on the horizon.

The path from curated fan clips to feature films will involve some unsexy innovation, not to mention negotiations over how to pay the creatives whose work feeds them. And while the financial burden of AI videos seems prohibitive, millions of people around the world are involved in creating and training AI models, and the cost of technology is generally falling. For example, in 1998, bandwidth was prohibitively expensive—it cost about $1200 per megabit per second (Mbps) per month for large networks, but by 2025 the lowest recorded cost was $0.05 per Mbps per montha reduction of 99.996 percent. This change made streaming on Disney+ or Netflix possible.

The cultural path of new media is much more difficult to imagine, and the resistance is often very strong. Poet Charles Baudelaire criticized vs photography in 1859 for lazy realism, which led art away from imagination. In centuries past, “skeptics and proponents have compared photography to painting and moving images to theater.” wrote the modern scientist Ruben de Lautour. It seems we are in even more difficult moment. What seems certain is that, as in the past, technology will develop quicklyallowing millions of creators to test possibilities we can't yet predict.

It's time to stand up for science

If you liked this article, I would like to ask for your support. Scientific American has been a champion of science and industry for 180 years, and now may be the most critical moment in that two-century history.

I was Scientific American I have been a subscriber since I was 12, and it has helped shape my view of the world. science always educates and delights me, instills a sense of awe in front of our vast and beautiful universe. I hope it does the same for you.

If you subscribe to Scientific Americanyou help ensure our coverage focuses on meaningful research and discovery; that we have the resources to report on decisions that threaten laboratories across the US; and that we support both aspiring and working scientists at a time when the value of science itself too often goes unrecognized.

In return you receive important news, fascinating podcastsbrilliant infographics, newsletters you can't missmust-watch videos challenging gamesand the world's best scientific articles and reporting. You can even give someone a subscription.

There has never been a more important time for us to stand up and show why science matters. I hope you will support us in this mission.

Leave a Comment