
Super Data Science: ML & AI Podcast with Jon Krohn 826: In Case You Missed It in September 2024
Oct 11, 2024
Julia Silge, Engineering Manager at Posit, shares insights on the development of Positron, an IDE designed specifically for data scientists' unique coding needs. Luca Anichin offers tips on enhancing machine learning models in PyTorch, stressing the balance between model and data. Marco Garelli discusses Polars, an open-source library that significantly speeds up data manipulation compared to Pandas. Mark Weissman highlights essential traits for data scientist hiring, advocating for practical skills over traditional qualifications.
AI Snips
Chapters
Transcript
Episode notes
Model Optimization
- Optimize models by considering both model-centric and data-centric approaches.
- Don't just focus on model adjustments; evaluate and improve data quality.
Model Complexity
- Start with the simplest model and build complexity as needed, testing simpler models first.
- Don't overcomplicate models initially; establish a baseline for comparison.
Mislabeled Data
- Luca's team achieved high accuracy with a custom vision model but hit a ceiling.
- They found 10% of their data was mislabeled, highlighting the impact of data quality.



