
Data Engineering Podcast From Academia to Industry: Bridging Data Engineering Challenges
45 snips
Aug 26, 2025 In this engaging discussion, Professor Paul Groth from the University of Amsterdam shares his expertise in AI systems and intelligent data engineering. He dives into the evolution of data provenance and lineage, illustrating its significance in today's workflows. Paul also highlights the transformative impact of large language models on knowledge graph construction and data integration. The conversation addresses the synergy between academia and industry, emphasizing human-AI collaboration and the need for tailored data management solutions.
AI Snips
Chapters
Transcript
Episode notes
Semantics Cause Access Control Friction
- Semantic divergence (customer vs person) causes access-control and governance headaches across teams.
- Future systems must adapt to varied models instead of enforcing a single org-wide schema.
Graphs Clarify But Don’t Perfect Identity
- Knowledge graphs make semantics explicit but they do not magically solve identity or disambiguation issues.
- Even curated graphs like Wikidata contain unresolved identity proliferation.
LLMs Lower Graph Construction Cost
- LLMs dramatically simplify information extraction and mapping for knowledge graph construction.
- You can build partial graphs and link to raw data instead of converting everything into the graph.
