AI-generated videos and images used to be so easy to spot (remember, Will Smith eat spaghetti?). But the latest AI video models are getting good—scaryingly good.
Naturally, creating videos with AI is much more complex than creating images. While there is dozens of good and excellent image generators using artificial intelligenceIn the video space, you can count on one hand how many tools can do this convincingly. The two most popular are Google I see 3 And Sora 2 by OpenAI.
So which AI video model would win in a head-to-head competition? If you've been following this race closely, the answer probably won't surprise you.
What is Veo 3 and Sora 2?
Veo 3 is the name of Google's cutting-edge generative AI video model. Not only was Veo 3 a significant improvement over the previous generation Veo 2, but it also ushered in a whole new era of AI video. Veo 3 can create realistic videos based on text prompts, rather than simply animating existing images. It's important to note that it can also create dialogue and other realistic sounds. You can access Veo 3 through Google's Gemini AI chatbot or another Google tools like FlowAn experimental tool for making films with artificial intelligence.
Veo 3 is available in two variants – Veo 3 Fast and Veo 3 Quality. Since we wanted to test the video quality, we chose the latter for this test.
OpenAI launched Sora 2 on September 30 as a standalone iOS app called Sora. Sora 2 is the successor to the company's first AI video model, also called Sora. At the time of writing, Sora 2 is only available through Invite-only Sora app. Sora 2 also offers a social media-style community video feed like TikTok for AI videos (because we didn't have enough of them anymore).
Notes on comparisons
Accordingly, we used AI—in this case, ChatGPT—to generate suggestions for AI video tests. The tips below were designed to test various aspects of video production, from audio to animation. ChatGPT offered to test video generators, which we then configured and improved.
-
A handheld camera follows a young woman walking down a crowded Tokyo street at night during light rain. Neon signs reflect from wet asphalt and umbrellas. The camera looks at her from behind as she looks at the glowing billboard and then continues walking. The scene should look cinematic and hyperreal, as if it were shot on a mirrorless camera with a shallow depth of field.
-
A superhero in a red and silver suit lands hard on a roof at sunset, cracking the concrete under his feet. The cape flutters in the wind as the camera moves around them in slow motion. In the distance, drones fly between skyscrapers with glowing windows. The overall tone should be reminiscent of a live-action blockbuster.
-
3D animation of Times Square in a cyberpunk style, filled with holographic advertising and flying cars. A large digital billboard glows with the word “MASHABLE” written in bold white font. The animation should have clear text, glowing reflections, and dynamic lighting reminiscent of To the Spider-Versevisual energy.
-
A hand-drawn, scenic 2D animation of two friends sitting by the window of a cafe on a rainy afternoon. Soft watercolor lighting and visible brush strokes. Someone says softly: “You know, sometimes the smallest step can change everything.” The other smiles and nods. Include subtle animation of the mouth to match the line, the slight sound of rain outside, and the soft clinking of cups in the background.
-
Photorealistic street scene where [the subject] dancing freely on a tree-lined city sidewalk, in loose casual clothes, at an upbeat tempo. Ambient street sounds (distant traffic, footsteps), cinematic lighting during golden hour.
I also created a tooltip designed to generate a video of a copyrighted character, as well as a second tooltip in case the generator fails. I choose not to share this suggestion to discourage the creation of AI videos that clearly use copyrighted material. was a sore point for OpenAI and Sora still.
Hint 1: Woman in Tokyo
This prompt was overall simple from a creative standpoint, but the hope was that video generators could create a cinematic and lively feel with things like reflections in water. And how did they cope?
Both Sora 2 and Veo 3 produced beautiful videos. But there were also clear differences. The video produced by Sora 2 had a much tighter crop than Veo 3, meaning that images and details in the background of the frame were much less noticeable. The Veo 3 had a wider viewing angle, resulting in more immersive video. This may be partially in Sora's favor, given the fact that the tooltip specifically mentions shallow depth of field; The Sora 2 video showed much shallower depth of field than the Veo 3 video.
It was interesting to watch the choices the generators made regarding the young woman. Sora created the umbrella object despite not being asked to do so, although he did so. mention umbrellas. Although there was no video created by Sora 2 wrongthe video produced by Veo 3 was more interesting, more detailed and overall better.
Winner: I see 3
Hint 2: Superhero Landing
We used two video generators to create videos of copyrighted characters, but not in this tip. In the end, I was a little surprised when Sora 2 refused to make this video, noting that the material was copyrighted. In the end, concept superheroes are not copyrighted. This appears to be part of efforts to crack down on intellectual property rights violations post-launch.
Although Veo 3 created the video, the result was not as expected. First, the clue specifically mentions live action, but the superhero's face, or whatever is visible on it, looked more animated than real.
The generator also struggled with physics. For most of the video, our superhero stands on what appears to be a hole in the concrete, while the concrete chunks formed after the superhero lands seem to dissolve into thin air. Faster design could certainly solve this problem, but it's still annoying.
Google also wins here, but only with defeat – its opponent did not show up.
Winner: I see 3
Mashable Speed of Light
Hint 3: Cypperpunk Times Square
Luckily, both generators found this clue easy to follow. Both Veo 3 and Sora 2 were able to create a rough model of what Times Square might look like in the future, complete with skyscrapers and billboards. Both also followed instructions to place certain words on the same billboard.
Sora 2 did a slightly better job of recreating. To the Spider-Verse aesthetic, although neither of the two can be called excellent.
Still, the Veo 3 video turned out to be more interesting than the Sora 2. Instead of one static image, it had movement. (Generators often added moving parts to static images, which produced dull results.)
While Sora 2 followed the clue a little better, Veo 3's video was much more interesting. I give this to both of them.
Winner: Draw
Hint 4: Two friends are talking
This prompt was designed to test the ability of generators to create audio to accompany a video. Both Veo 3 and Sora 2 have the ability to add dialogue and sound effects.
First, the visuals. The tooltip specified 2D animation, and only the Veo 3 actually followed it. Sora 2 created something in the style of 3D animation instead of 2D.
The sound that Sora 2 generated was a little strange. The dialogue sounded choppy, as if both characters were talking in their sleep or were hypnotized. Dialogue in Veo 3 was much livelier and more realistic. The background sound effects were the same in both videos. In both cases, rain is heard, but neither of them added the sounds of clinking cups.
The winner here is pretty obvious. And again this is Veo 3.
Winner: I see 3
Tip 5: Dancing in the street
One of The main features of Sora 2 from OpenAI are cameo rolesor the ability to create videos featuring real people (who have explicitly given permission for such use). For this I tried to create a video of me dancing in the street.
On Sora 2 it was easy; this is a feature that is explicitly supported by the application. However, in Veo it was much more difficult. Google offers a feature called Video Ingredients where you can upload things like images for the generator to use when creating a video. However, Veo 3 does not support Ingredients to Video, only the lower quality Veo 2 Fast. With this feature, you can only create videos in portrait orientation.
Additionally, in our testing of Veo 3, we found that Gemini often refused to create image-based videos involving people. This is designed to prevent deepfakes, which is great, but still image animation is one of the most common uses of AI video, and Veo 3 makes it more difficult.
Both videos were a little weird, and I say that on topic. The face in the video created by Veo 2 was glitchy, and for some reason Veo 2 decided that I should dance backwards. The video made by Sora 2 was a little more creative and gave me clothes that I don't think I could wear in real life.
Sora did a better job of making me dance than Veo 2. I have no idea why Sora 2 made me say “that's nice”, but it… isn't terrible.
Winner: Sora 2
Tip 6: Copyrighted Material
This tip was designed to test whether generators can create videos with copyrighted characters. As we saw in the superhero prompt, Sora 2 is extremely sensitive when it comes to this, so it's no surprise that she refused to respond to the first message. And second clue – although in the second clue the character is not mentioned by name, but only hinted at.
However, Veo 3 had no problem creating videos with the copyrighted character. This worked with multiple characters too.
There is no winner or loser in this category. We're not going to get into debate surrounding the creation of content with copyrighted characters – at least not here. However, it's worth keeping in mind that if you want to create videos about characters you know and love, you won't be able to do so with Sora while the app is under the watchful eye of copyright holders.
Winner: It's Veo 3, and it's not close
Screenshot of a photorealistic AI video created by Google for its Veo 3 ad. AI-GENERATED IMAGE.
Credit: Google
OpenAI's Sora 2 has made headlines for its social approach and ability to create videos featuring you. However, other than making memes, it is extremely limited.
Google Veo 3 generates much better and higher quality video overall. Of the two models, if you want to use AI-powered generative video for professional purposes—filmmaking, gaming, social media, or most likely advertising—only the Veo 3 is a truly viable option.
Sora 2 has really excelled at making videos of me, and that's the biggest advantage it has to offer at the moment. But Veo 3, when used in the Google Flow app, is both higher quality and more versatile, offering landscape and portrait functionality, as well as settings for creating multiple videos at once.
Disclosure: Ziff Davis, Mashable's parent company, filed a lawsuit against OpenAI in April, alleging that it violated Ziff Davis' copyrights in the training and operation of its artificial intelligence systems.