#008 Knowledge Graphs for Better RAG, Virtual Entities, Hybrid Data Models

27 snips

May 20, 2024

Kirk Marple, CEO of Graphlit, discusses using knowledge graphs for enhanced information retrieval, a hybrid data model creating virtual entities, entity extraction using Azure Cognitive Services, metadata-first approach for better data indexing, and challenges in knowledge graph development.

Ask episode

AI Snips

Chapters

Transcript

Episode notes

INSIGHT

Virtual Entities Across Three Stores

GraphLit models each entity as a 'virtual entity' with physical representations across document store, graph node, and search index.
This hybrid storage pattern enables flexible retrieval by text, metadata, and graph queries.

ANECDOTE

Pivot From Catalog To API Platform

Kirk pivoted GraphLit from a data catalog to an API-focused platform after noticing niche vertical apps for LLMs.
He found that exposing the platform via API let many teams build domain-specific copilots quickly.

ADVICE

Filter RAG Context With Graphs

Use knowledge graphs as a filtering layer on top of keyword and vector search to constrain RAG context.
Auto-detect entities in prompts and filter the domain of data the model uses for retrieval.

Get the Snipd Podcast app to discover more snips from this episode

Get the app

Kirk Marple, CEO and founder of Graphlit, discusses the evolution of his company from a data cataloging tool to an platform designed for ETL (Extract, Transform, Load) and knowledge retrieval for Large Language Models (LLMs). Graphlit empowers users to build custom applications on top of its API that go beyond naive RAG.

Key Points:

Knowledge Graphs: Graphlet utilizes knowledge graphs as a filtering layer on top of keyword metadata and vector search, aiding in information retrieval.
Storage for KGs: A single piece of content in their data model resides across multiple systems: a document store with JSON, a graph node, and a search index. This hybrid approach creates a virtual entity with representations in different databases.
Entity Extraction: Azure Cognitive Services and other models are employed to extract entities from text for improved understanding.
Metadata-first approach: The metadata-first strategy involves extracting comprehensive metadata from various sources, ensuring it is canonicalized and filterable. This approach aids in better indexing and retrieval of data, crucial for effective RAG.
Challenges: Entity resolution and deduplication remain significant challenges in knowledge graph development.

Notable Quotes:

"Knowledge graphs is a filtering [mechanism]...but then I think also the kind of spidering and pulling extra content in is the other place this comes into play."
"Knowledge graphs to me are kind of like index per se...you're providing a new type of index on top of that."
"[For RAG]...you have to find constraints to make it workable."
"Entity resolution, deduping, I think is probably the number one thing."
"I've essentially built a connector infrastructure that would be like a FiveTran or something that Airflow would have..."
"One of the reasons is because we're a platform as a service, the burstability of it is really important. We can spin up to a hundred instances without any problem, and we don't have to think about it."
"Once cost and performance become a no-brainer, we're going to start seeing LLMs be more of a compute tool. I think that would be a game-changer for how applications are built in the future."

Kirk Marple:

Nicolay Gerold:

Chapters

00:00 Graphlit’s Hybrid Approach 02:23 Use Cases and Transition to Graphlit 04:19 Knowledge Graphs as a Filtering Mechanism 13:23 Using Gremlin for Querying the Graph 32:36 XML in Prompts for Better Segmentation 35:04 The Future of LLMs and Graphlit 36:25 Getting Started with Graphlit

Graphlit, knowledge graphs, AI, document store, graph database, search index co-pilot, entity extraction, Azure Cognitive Services, XML, event-driven architecture, serverless architecture graph rag, developer portal