Text-to-video model

A text-to-video model is a machine learning model which takes as input a natural language description and produces a video matching that description.^[1] Video prediction on making objects realistic in a stable background is performed by using recurrent neural network for a sequence to sequence model with a connector convolutional neural network encoding and decoding each frame pixel by pixel,^[2] creating video using deep learning.^[3]

Methodology

Data collection and data set preparation using clear video from kinetic human action video.
Training the convolutional neural network for making video.
Keywords extraction from text using natural-language programming .
Testing of Data set in conditional generative model for existing static and dynamic information from text by variational autoencoder and generative adversarial network.

Models

There are different models including open source models. CogVideo presented their code in GitHub.^[4] Meta Platforms uses text-to-video with makeavideo.studio.^[5]^[6]^[7]Google used Imagen Video for converting text-to-video.^[8]^[9]^[10]^[11]^[12]

Antonia Antonova presented another model.^[13]

In March 2023, a landmark research paper by Alibaba research was published, applying many of the principles found in latent image diffusion models to video generation.^[14]^[15] Services like Kaiber or Reemix have since adopted similar approaches to video generation in their respective products.

Matthias Niessner (TUM) and Lourdes Agapito (UCL) at AI company Synthesia work on developing 3D neural rendering techniques that synthesise realistic video. The goal is to improve existing text to video model by 2D and 3D neural representations of shape appearance and motion for controllable video synthesis of avatars that look and sound like real people.^[16]

Although alternative approaches exist,^[17] full latent diffusion models are currently regarded to be state of the art for video diffusion.

References

↑ Artificial Intelligence Index Report 2023 (Report). Stanford Institute for Human-Centered Artificial Intelligence. p. 98. https://aiindex.stanford.edu/wp-content/uploads/2023/04/HAI_AI-Index-Report_2023.pdf. "Multiple high quality text-to-video models, AI systems that can generate video clips from prompted text, were released in 2022."
↑ "Leading India". https://www.leadingindia.ai/downloads/projects/VP/vp_16.pdf.
↑ Narain, Rohit (2021-12-29). "Smart Video Generation from Text Using Deep Neural Networks" (in en-US). https://www.datatobiz.com/blog/smart-video-generation-from-text/.
↑ CogVideo, THUDM, 2022-10-12, https://github.com/THUDM/CogVideo, retrieved 2022-10-12
↑ Davies, Teli (2022-09-29). "Make-A-Video: Meta AI's New Model For Text-To-Video Generation" (in en). https://wandb.ai/telidavies/ml-news/reports/Make-A-Video-Meta-AI-s-New-Model-For-Text-To-Video-Generation--VmlldzoyNzE4Nzcx.
↑ Monge, Jim Clyde (2022-08-03). "This AI Can Create Video From Text Prompt" (in en). https://betterprogramming.pub/this-ai-can-create-video-from-text-prompt-6904439d7aba.
↑ "Meta's Make-A-Video AI creates videos from text". https://www.fonearena.com/blog/375627/meta-make-a-video-ai-create-videos-from-text.html.
↑ "google: Google takes on Meta, introduces own video-generating AI - The Economic Times". https://m.economictimes.com/tech/technology/google-takes-on-meta-introduces-own-video-generating-ai/amp_articleshow/94681128.cms?amp_gsa=1&amp_js_v=a9&usqp=mq331AQKKAFQArABIIACAw==#amp_tf=From%20%251$s&aoh=16655942495197&referrer=https://www.google.com&ampshare=https://m.economictimes.com/tech/technology/google-takes-on-meta-introduces-own-video-generating-ai/articleshow/94681128.cms.
↑ Monge, Jim Clyde (2022-08-03). "This AI Can Create Video From Text Prompt" (in en). https://betterprogramming.pub/this-ai-can-create-video-from-text-prompt-6904439d7aba.
↑ "Nuh-uh, Meta, we can do text-to-video AI, too, says Google". https://www.theregister.com/AMP/2022/10/06/google_ai_imagen_video/.
↑ "Papers with Code - See, Plan, Predict: Language-guided Cognitive Planning with Video Prediction" (in en). https://paperswithcode.com/paper/see-plan-predict-language-guided-cognitive.
↑ "Papers with Code - Text-driven Video Prediction" (in en). https://paperswithcode.com/paper/text-driven-video-prediction.
↑ "Text to Video Generation" (in en-US). https://antonia.space/text-to-video-generation.
↑ "Home - DAMO Academy". https://damo.alibaba.com/.
↑ Luo, Zhengxiong; Chen, Dayou; Zhang, Yingya; Huang, Yan; Wang, Liang; Shen, Yujun; Zhao, Deli; Zhou, Jingren; Tan, Tieniu (2023). "VideoFusion: Decomposed Diffusion Models for High-Quality Video Generation". arXiv:2303.08320 [cs.CV].
↑ "Text to Speech for Videos". https://www.synthesia.io/text-to-speech.
↑ Text2Video-Zero, Picsart AI Research (PAIR), 2023-08-12, https://github.com/Picsart-AI-Research/Text2Video-Zero, retrieved 2023-08-12

0.00

(0 votes)

Original source: https://en.wikipedia.org/wiki/Text-to-video model. Read more

[AIIR-1] Artificial Intelligence Index Report 2023 (Report). Stanford Institute for Human-Centered Artificial Intelligence. p. 98. https://aiindex.stanford.edu/wp-content/uploads/2023/04/HAI_AI-Index-Report_2023.pdf. "Multiple high quality text-to-video models, AI systems that can generate video clips from prompted text, were released in 2022."

[2] "Leading India". https://www.leadingindia.ai/downloads/projects/VP/vp_16.pdf.

[3] Narain, Rohit (2021-12-29). "Smart Video Generation from Text Using Deep Neural Networks" (in en-US). https://www.datatobiz.com/blog/smart-video-generation-from-text/.

[4] CogVideo, THUDM, 2022-10-12, https://github.com/THUDM/CogVideo, retrieved 2022-10-12

[5] Davies, Teli (2022-09-29). "Make-A-Video: Meta AI's New Model For Text-To-Video Generation" (in en). https://wandb.ai/telidavies/ml-news/reports/Make-A-Video-Meta-AI-s-New-Model-For-Text-To-Video-Generation--VmlldzoyNzE4Nzcx.

[6] Monge, Jim Clyde (2022-08-03). "This AI Can Create Video From Text Prompt" (in en). https://betterprogramming.pub/this-ai-can-create-video-from-text-prompt-6904439d7aba.

[7] "Meta's Make-A-Video AI creates videos from text". https://www.fonearena.com/blog/375627/meta-make-a-video-ai-create-videos-from-text.html.

[8] "google: Google takes on Meta, introduces own video-generating AI - The Economic Times". https://m.economictimes.com/tech/technology/google-takes-on-meta-introduces-own-video-generating-ai/amp_articleshow/94681128.cms?amp_gsa=1&amp_js_v=a9&usqp=mq331AQKKAFQArABIIACAw==#amp_tf=From%20%251$s&aoh=16655942495197&referrer=https://www.google.com&ampshare=https://m.economictimes.com/tech/technology/google-takes-on-meta-introduces-own-video-generating-ai/articleshow/94681128.cms.

[9] Monge, Jim Clyde (2022-08-03). "This AI Can Create Video From Text Prompt" (in en). https://betterprogramming.pub/this-ai-can-create-video-from-text-prompt-6904439d7aba.

[10] "Nuh-uh, Meta, we can do text-to-video AI, too, says Google". https://www.theregister.com/AMP/2022/10/06/google_ai_imagen_video/.

[11] "Papers with Code - See, Plan, Predict: Language-guided Cognitive Planning with Video Prediction" (in en). https://paperswithcode.com/paper/see-plan-predict-language-guided-cognitive.

[12] "Papers with Code - Text-driven Video Prediction" (in en). https://paperswithcode.com/paper/text-driven-video-prediction.

[13] "Text to Video Generation" (in en-US). https://antonia.space/text-to-video-generation.

[14] "Home - DAMO Academy". https://damo.alibaba.com/.

[15] Luo, Zhengxiong; Chen, Dayou; Zhang, Yingya; Huang, Yan; Wang, Liang; Shen, Yujun; Zhao, Deli; Zhou, Jingren; Tan, Tieniu (2023). "VideoFusion: Decomposed Diffusion Models for High-Quality Video Generation". arXiv:2303.08320 [cs.CV].

[16] "Text to Speech for Videos". https://www.synthesia.io/text-to-speech.

[17] Text2Video-Zero, Picsart AI Research (PAIR), 2023-08-12, https://github.com/Picsart-AI-Research/Text2Video-Zero, retrieved 2023-08-12

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]

[17]

Anonymous

Search

Text-to-video model

Namespaces

More

Page actions

Contents

Methodology

Models

See also

References

Navigation

Navigation

Help

Translate

Wiki tools

Wiki tools

Anonymous

Search

Text-to-video model

Methodology

Models

See also

References

Navigation

Wiki tools

Page tools

Other projects

Categories