Practical AI

Video generation with realistic motion

25 snips
Jan 23, 2025
Paras Jain, CEO of Genmo, leads a company dedicated to creating videos with realistic motion. He discusses the current surge in video generation tools and the challenges models face, particularly in achieving lifelike walking motions. Paras shares insights on the evolution from traditional GANs to advanced diffusion models like Mochi, emphasizing the importance of quality data. He also envisions a future where AI empowers creativity in storytelling, making video creation accessible and enhancing originality in content.
Ask episode
AI Snips
Chapters
Transcript
Episode notes
INSIGHT

From GANs to Diffusion Models

  • Early image generation models like GANs struggled with mode collapse, limiting diverse outputs.
  • Diffusion models offer better mode coverage and generalization, iteratively denoising to generate images and videos.
ADVICE

Balancing Model Size and Compute

  • Consider model size and compute needs based on desired video quality and available resources.
  • Certain capabilities emerge only at larger scales, while open-sourcing allows community experimentation and fine-tuning.
INSIGHT

Video Generation Process in Mochi

  • Video models generate all pixels at once through multiple denoising passes, unlike language models' token-by-token generation.
  • Mochi uses a multi-stage approach, including video compression via a variational autoencoder (VAE), for efficiency.
Get the Snipd Podcast app to discover more snips from this episode
Get the app