
This Day in AI Podcast EP72: Croc Test with Gemini 1.5 Experimental, Flux Destroys Midjourney & GPT4o Model Updates
14 snips
Aug 7, 2024 Dive into the intriguing world of AI as the hosts tackle Google's Gemini 1.5 model, discussing its crocodile video analysis capabilities and performance challenges. They compare AI models like Flux and MidJourney, revealing Flux's superiority in image generation. Exciting updates on OpenAI's GPT-4 model highlight structured outputs and cost reductions. The conversation wraps up with insights into the current AI development landscape, emphasizing the need for reliable tools in an increasingly competitive market.
AI Snips
Chapters
Transcript
Episode notes
Gemini Video Test Results
- Mike tested Gemini 1.5 Pro Experimental with his crocodile video, asking about events and people.
- While identifying key takeaways, it hallucinated names of people in the video.
Claude's Performance on Croc Video
- Claude 3.5 Sonnet, given an audio transcript, accurately identified the location and key details.
- It avoided hallucinations and provided more accurate information than Gemini, even without video.
Gemini's Hallucination Problem
- Despite its large context window, Gemini's underlying model is prone to hallucinations.
- Claude performs better at video interpretation using audio transcripts, suggesting Gemini prioritizes visuals over audio.
