Meta announced Movie Gen, a new potential tool for — or threat to — Hollywood. Movie Gen can produce video clips up to 16 seconds long and audio clips up to 45 seconds long, generating both visuals and sound effects that sync seamlessly with the content.

Movie Gen models and capabilities: Movie Gen Video: 30B parameter transformer model that can generate high-quality and high-definition images and videos from a single text prompt.

Movie Gen Audio: A 13B parameter transformer model that can take a video input along with optional text prompts for controllability to generate high-fidelity audio synced to the video. It can generate ambient sound, instrumental background music and foley sound — delivering state-of-the-art results in audio quality, video-to-audio alignment and text-to-audio alignment.

Precise video editing: Using a generated or existing video and accompanying text instructions as an input it can perform localized edits such as adding, removing or replacing elements — or global changes like background or style changes.

Personalized videos: Using an image of a person and a text prompt, the model can generate a video with state-of-the-art results on character preservation and natural movement in the video.

Meta plans to collaborate directly with the entertainment industry and content creators, incorporating the tool into its products sometime next year. The model was built using a mix of licensed and publicly available datasets, Meta said.

Meta outlines more technical details for Movie Gen in a 92-page research paper, available here.