Video generation with realistic motion

25 snips

Jan 23, 2025

Paras Jain, CEO of Genmo, leads a company dedicated to creating videos with realistic motion. He discusses the current surge in video generation tools and the challenges models face, particularly in achieving lifelike walking motions. Paras shares insights on the evolution from traditional GANs to advanced diffusion models like Mochi, emphasizing the importance of quality data. He also envisions a future where AI empowers creativity in storytelling, making video creation accessible and enhancing originality in content.

Ask episode

AI Snips

Chapters

Transcript

Episode notes

INSIGHT

From GANs to Diffusion Models

Early image generation models like GANs struggled with mode collapse, limiting diverse outputs.
Diffusion models offer better mode coverage and generalization, iteratively denoising to generate images and videos.

ADVICE

Balancing Model Size and Compute

Consider model size and compute needs based on desired video quality and available resources.
Certain capabilities emerge only at larger scales, while open-sourcing allows community experimentation and fine-tuning.

INSIGHT

Video Generation Process in Mochi

Video models generate all pixels at once through multiple denoising passes, unlike language models' token-by-token generation.
Mochi uses a multi-stage approach, including video compression via a variational autoencoder (VAE), for efficiency.

Get the Snipd Podcast app to discover more snips from this episode

Get the app