
Theories of Everything with Curt Jaimungal David Hand: How Dark Data Makes AI and LLMs Dangerously Unreliable
6 snips
Aug 14, 2023 David Hand, a professor of statistics at Imperial College London, dives into the fascinating world of dark data and its implications on analysis. He explains how unseen data can mislead conclusions, especially in critical areas like public health and AI. Hand contrasts data-driven and theory-driven models, emphasizing the risks of relying solely on the former. He also addresses the limitations of large language models, cautioning against their perceived intelligence. The discussion reveals the intricate balance between data transparency and public trust in research.
AI Snips
Chapters
Books
Transcript
Episode notes
Credit Scoring Model Example
- A credit scoring model built on data from people over 70 might fail when applied to people under 30, illustrating the brittleness of data-driven models.
- The differing financial circumstances and risk profiles of these groups demonstrate why data-driven models need diverse, representative datasets.
Missing Crucial Data
- Missing crucial data can lead to flawed models and inaccurate predictions, as seen with early COVID models that didn't consider age.
- Consider variables like demographics or underlying conditions to create comprehensive models.
File Drawer Problem and Funnel Plots
- The 'file drawer problem' highlights how non-significant results often go unpublished, leading to biased conclusions.
- Funnel plots can reveal missing data in research by showing gaps in published results.



