OpenAI expands multimodal capabilities with updated text-to-video model

Tuesday December 10, 2024. 08:52 AM , from ComputerWorld

OpenAI has released a new version of its text-to-video AI model, Sora, for ChatGPT Plus and Pro users, marking another step in expansion into multimodal AI technologies.

The original Sora model, introduced earlier this year, was restricted to safety testers in the research preview phase, limiting its availability.

The new Sora Turbo version offers significantly faster performance compared to its predecessor, OpenAI said in a blog post.

Sora is currently available to users across all regions where ChatGPT operates, except in the UK, Switzerland, and the European Economic Area, where OpenAI plans to expand access in the coming months.

ChatGPT, which gained global prominence in 2022, has been a driving force behind the widespread adoption of generative AI. Sora reflects OpenAI’s ongoing efforts to maintain a competitive edge in the rapidly evolving AI landscape.

Keeping pace with rivals

The move positions OpenAI to compete with similar offerings from rivals like Meta, Google, and Stability AI.

“The true power of GenAI will be in realizing its multi-model capabilities,” said Sharath Srinivasamurthy, associate vice president at IDC. “Since OpenAI was lagging behind its competitors in text to video, this move was needed to stay relevant and compete.”

However, both Google and Meta outpaced OpenAI in making their models publicly reviewable, even though Sora was first introduced in discussions back in February.

“OpenAI likely anticipated becoming a target if it launched this service first, so it seems probable that they waited for other companies to release their video generation products while refining Sora for public preview or alpha testing,” said Hyoun Park, CEO and chief analyst at Amalgam Insights. “OpenAI is offering longer videos, whereas Google supports six-second videos and Meta supports 16-second videos.”

Integration remains a work in progress, though OpenAI is expected to eventually provide data integration for Sora comparable to its other models, Park added.

Managing regulatory concerns

Sora-generated videos will include C2PA metadata, enabling users to identify the content’s origin and verify its authenticity. This is significant amid global regulatory efforts to ensure AI firms adhere to compliance requirements.

“While imperfect, we’ve added safeguards like visible watermarks by default, and built an internal search tool that uses technical attributes of generations to help verify if content came from Sora,” OpenAI said in the post.

Even with such safeguards, the use of data in training AI models continues to spark debates over intellectual property rights. In August, a federal judge in California ruled that visual artists could proceed with certain copyright claims against AI companies like Stability AI.

“As with all of OpenAI’s generative tools, Sora faces challenges related to being trained on commercial data, which is often subject to copyright and, in some cases, patents,” Park said. “This could create opportunities for vendors like Anthropic and Cohere, which have been more focused on adhering to EU governance guidelines.” Extensive testing is critical for video-based generative AI applications due to concerns such as the rise of deepfakes, which likely contributed to the time it took OpenAI to release the model, according to Srinivasamurthy.