The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)

More Language, Less Labeling with Kate Saenko - #580

Jun 27, 2022

In this discussion with Kate Saenko, an associate professor at Boston University and a consulting professor at the MIT-IBM Watson AI Lab, she dives into the exciting world of multimodal learning. Kate highlights the significance of integrating vision and language, revealing innovations like DALI 2 and CLIP. She addresses bias in AI sourced from vast online datasets and shares insights on reducing labeling costs through effective prompting techniques. The conversation also touches on the challenges facing smaller labs in a resource-dominated landscape, alongside strategies for robust model generalization.

Ask episode

AI Snips

Chapters

Transcript

Episode notes

INSIGHT

Computer Vision's Transformation

Computer vision has dramatically improved over time.
Early computer vision only worked for limited applications like face detection, unlike today.

INSIGHT

Multimodal Learning Breakthroughs

Multimodal learning, like using audio and visuals for lip reading, has existed for a long time.
The current breakthroughs stem from increased data and model sizes, enabling emergent properties in models like DALL-E 2.

INSIGHT

Bias in Internet Data

Internet data offers abundant visual information, but it contains inherent biases.
Models like DALL-E 2 reflect these biases, generating stereotypical images for prompts like "engineers" or "doctors."

Get the Snipd Podcast app to discover more snips from this episode

Get the app