Momooreoluwa Oshinaike ’25

Contributor

On February 15th, OpenAI revealed its latest product: Sora. The official OpenAI X (formerly Twitter) account posted the following: “Introducing Sora, our text-to-video model. Sora can create videos of up to 60 seconds featuring highly detailed scenes, complex camera motion, and multiple characters with vibrant emotions.” Along with this reveal, OpenAI released numerous videos created by the model.

A few months prior, AI-generated videos were unconvincing and easy to separate from real videos. However, the videos created by Sora contain hyper-realistic and stylized depictions of people, vehicles, and animals. Sora can object permanence, meaning objects temporarily hidden from view by another object passing in front of them do not permanently vanish, but instead reappear on the other side. Sora has also demonstrated the ability to capture multiple shots and angles of a subject, even creating cinematic cuts when prompted.

Currently, Sora is not publicly available, but the videos and the prompts used to create them are. The prompts show that Sora has a significant understanding of language, as it can easily create specific objects, specific styles, specific settings, and even specific emotions.

Open AI has noted that the model is far from perfect: “It may struggle with accurately simulating the physics of a complex scene and may not understand specific instances of cause and effect”. Many oddities generated by the model have been revealed, such as a man running on a treadmill backward and a chair appearing from thin air. OpenAI has also made statements regarding safety. They indicated that they would be working with experts in “areas like misinformation, hateful content, and bias” to test the model. They will be taking several measures to halt prompts that might contain content like “extreme violence, sexual and hateful imagery, celebrity likeness, or the intellectual property of others.”

There are still some ways to tell Sora-generated and real videos apart. The model still suffers the same pitfalls other generative AI models do. Signs, text, hands, and close views of subjects are occasionally inconsistent or outright wrong. Ultimately, if you were to look at a video generated by Sora for long enough, there will inevitably be a detail that gives it away.

When Sora is released to the public, it may become impossible to distinguish reality from fiction, especially for those susceptible to media manipulation, such as children or the elderly. A myriad of questions and concerns will be raised by the sudden advancement in AI technology, such as: what would happen if safety measures were breached? Or, how does OpenAI guarantee consent for every video Sora was trained on? Perhaps most importantly, will this be worth all the harm it could cause?

Momooreoluwa Oshinaike ’25

Related