Theories of Everything with Curt Jaimungal

David Hand: How Dark Data Makes AI and LLMs Dangerously Unreliable

6 snips

Aug 14, 2023

David Hand, a professor of statistics at Imperial College London, dives into the fascinating world of dark data and its implications on analysis. He explains how unseen data can mislead conclusions, especially in critical areas like public health and AI. Hand contrasts data-driven and theory-driven models, emphasizing the risks of relying solely on the former. He also addresses the limitations of large language models, cautioning against their perceived intelligence. The discussion reveals the intricate balance between data transparency and public trust in research.

Ask episode

AI Snips

Chapters

Books

Transcript

Episode notes

ANECDOTE

Credit Scoring Model Example

A credit scoring model built on data from people over 70 might fail when applied to people under 30, illustrating the brittleness of data-driven models.
The differing financial circumstances and risk profiles of these groups demonstrate why data-driven models need diverse, representative datasets.

INSIGHT

Missing Crucial Data

Missing crucial data can lead to flawed models and inaccurate predictions, as seen with early COVID models that didn't consider age.
Consider variables like demographics or underlying conditions to create comprehensive models.

ANECDOTE

File Drawer Problem and Funnel Plots

The 'file drawer problem' highlights how non-significant results often go unpublished, leading to biased conclusions.
Funnel plots can reveal missing data in research by showing gaps in published results.

Get the Snipd Podcast app to discover more snips from this episode

Get the app