Theories of Everything with Curt Jaimungal

David Hand: How Dark Data Makes AI and LLMs Dangerously Unreliable

6 snips
Aug 14, 2023
David Hand, a professor of statistics at Imperial College London, dives into the fascinating world of dark data and its implications on analysis. He explains how unseen data can mislead conclusions, especially in critical areas like public health and AI. Hand contrasts data-driven and theory-driven models, emphasizing the risks of relying solely on the former. He also addresses the limitations of large language models, cautioning against their perceived intelligence. The discussion reveals the intricate balance between data transparency and public trust in research.
Ask episode
AI Snips
Chapters
Books
Transcript
Episode notes
ANECDOTE

Credit Scoring Model Example

  • A credit scoring model built on data from people over 70 might fail when applied to people under 30, illustrating the brittleness of data-driven models.
  • The differing financial circumstances and risk profiles of these groups demonstrate why data-driven models need diverse, representative datasets.
INSIGHT

Missing Crucial Data

  • Missing crucial data can lead to flawed models and inaccurate predictions, as seen with early COVID models that didn't consider age.
  • Consider variables like demographics or underlying conditions to create comprehensive models.
ANECDOTE

File Drawer Problem and Funnel Plots

  • The 'file drawer problem' highlights how non-significant results often go unpublished, leading to biased conclusions.
  • Funnel plots can reveal missing data in research by showing gaps in published results.
Get the Snipd Podcast app to discover more snips from this episode
Get the app