
Practical AI Video generation with realistic motion
25 snips
Jan 23, 2025 Paras Jain, CEO of Genmo, leads a company dedicated to creating videos with realistic motion. He discusses the current surge in video generation tools and the challenges models face, particularly in achieving lifelike walking motions. Paras shares insights on the evolution from traditional GANs to advanced diffusion models like Mochi, emphasizing the importance of quality data. He also envisions a future where AI empowers creativity in storytelling, making video creation accessible and enhancing originality in content.
AI Snips
Chapters
Transcript
Episode notes
From GANs to Diffusion Models
- Early image generation models like GANs struggled with mode collapse, limiting diverse outputs.
- Diffusion models offer better mode coverage and generalization, iteratively denoising to generate images and videos.
Balancing Model Size and Compute
- Consider model size and compute needs based on desired video quality and available resources.
- Certain capabilities emerge only at larger scales, while open-sourcing allows community experimentation and fine-tuning.
Video Generation Process in Mochi
- Video models generate all pixels at once through multiple denoising passes, unlike language models' token-by-token generation.
- Mochi uses a multi-stage approach, including video compression via a variational autoencoder (VAE), for efficiency.

