MLOps.community

Monitoring Unstructured Data // Aparna Dhinakaran & Jason Lopatecki // Lightning Sessions #2

Sep 27, 2022
Ask episode
AI Snips
Chapters
Transcript
Episode notes
INSIGHT

Challenges of Unstructured Data

  • 80% of data teams handle in production is unstructured and often unlabeled, making model understanding and correctness challenging.
  • Unstructured data can drastically differ from training data, causing performance drops in production models.
INSIGHT

Embeddings as Model Interfaces

  • Embeddings represent internal model knowledge as mathematical vectors capturing semantic meaning in various data types.
  • They serve as an ideal interface for troubleshooting complex models working with unstructured data.
ANECDOTE

Why Embeddings Help Troubleshooting

  • Explainability highlights image pixels relevant to a prediction but may not reveal why a model made a decision.
  • Embeddings compress model-learned features, showing semantic similarities like shared features or color in images, aiding troubleshooting.
Get the Snipd Podcast app to discover more snips from this episode
Get the app