
Paco Nathan: Graph Thinking to Better Understand Graph RAG – Episode 10
Knowledge Graph Insights
00:00
Navigating Complex Data with Knowledge Graphs
This chapter explores the challenges of linking entities across diverse datasets, emphasizing the vital role of knowledge graphs in tracking illicit activities. It discusses the evolution from traditional data handling to graph-oriented approaches, showcasing how 'graph thinking' enhances our understanding of complex relationships. The conversation also highlights the need for leadership skills to navigate the intricacies of modern data management and decision-making.
Play episode from 02:02
Transcript
Transcript
Episode notes
Paco Nathan
Graph RAG is all the rage right now in the AI world. Paco Nathan is uniquely positioned to help the industry understand and contextualize this new technology.
Paco currently leads a knowledge graph practice at an AI startup, and he has been immersed in the AI community for more than 40 years.
His broad and deep understanding of the tech and business terrain, along with his "graph thinking" approach, provides executives and other decision makers a clear view of terrain that is often obfuscated by less experienced and knowledgeable advisors.
We talked about:
his work building out the knowledge graph practice at Senzing, and their focus on entity resolution
the importance of entity resolution in knowledge graph use cases like fraud detection
the high percentage of knowledge graph projects that we never hear about because of their sensitive or proprietary nature
his take on the concept of "graph thinking" and how he and colleagues illustrate it with a simple graph model of a medieval village
how graphs add structure and context to our understanding of the world
the importance of embracing complexity and the Cynefin framework in which he grounds various types of business challenges: simple, complicated, complex, and chaotic
how to apply insights discerned from a Cynefin framing in management
how knowledge graphs can help oranizations understand the complex environments in which they operate
the wide range of industries and government entities that are applying knowledge graphs to concerns like supply chains, ESG, etc.
his overview of RAG - retrieval augmented generation and graph RAG
the wide variety of uses of the term "graph" in the current technology landscape
Microsoft's graph RAG which uses NetworkX inside their graph RAG library, not a graph database
Neo4j's approach which creates a "lexical graph" based an an NLP analysis of text
"embedding graphs"
ontology-based graphs
Google's approach to RAG, using graph neural networks
graphs that do reasoning over LLM-created facts assertions
"graph of thought" graphs based on chain-of-prompt thinking
"causal graphs" that permit causal reasoning
"graph analytics" graphs that re-rank possible answers
the evolution of graph RAG libraries and the variety of design patterns they employ
the shift in discovery dominance from search to recommender systems, most of which use knowledge graphs
examples of graph RAG from LlamaIndex and LangChain, in addition to Microsoft's graph RAG
his prediction that we'll see more reinforcement learning, graph tech, and advanced math capabilities like causality in addition to LLMs in AI systems
his reflection on his efforts to advance graph thinking over the past 4 years and the current state of LLMs, graphs, graph RAG, and the open-source software community
the need for a shift in thinking in the industry, in particular the need for cross-pollination across tech proficiencies and enterprise teams
the "10:1 ratio for the number of graph RAG experts versus the number of people we've actually worked with a library"
Paco's bio
Paco Nathan leads DevRel for the Entity Resolved Knowledge Graph practice area at Senzing.com and is a computer scientist with +40 years of tech industry experience and core expertise in data science, natural language, graph technologies, and cloud computing. He's the author of numerous books, videos, and tutorials about these topics.
Paco advises Kurve.ai, EmergentMethods.ai, KungFu.ai, DataSpartan, and Argilla.io (acq. Hugging Face), and is lead committer for the pytextrank and kglab open source projects. Formerly: Director of Learning Group at O'Reilly Media; and Director of Community Evangelism at Databricks.
Connect with Paco online
LinkedIn
Sessionize
Derwen.ai
Senzing.com
Resources mentioned in this interview
Connected Data London conference
Knowledge Graph Conference
GraphGeeks community
REALM: Retrieval-Augmented Language Model Pre-Training, Guu, et al.
Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks, Lewis, et al.
NebulaGraph Launches Industry-First Graph RAG: Retrieval-Augmented Generation with LLM Based on Knowledge Graphs
Graph Retrieval-Augmented Generation: A Survey, Peng, et al.
Video
Here’s the video version of our conversation:
https://youtu.be/4pmV6BUSKmY
Podcast intro transcript
This is the Knowledge Graph Insights podcast, episode number 10. As enterprises and tech companies have looked to ground in factual knowledge the answers that their LLMs deliver, graph RAG architectures and products have sprung to the fore. With his deep background in Silicon Valley culture, the open-source software community, artificial intelligence practice, and knowledge graphs and semantic technology, Paco Nathan is one of the best-positioned people in the industry to help us understand the current state of graph RAG.
Interview transcript
xLarry:
Hi, everyone. Welcome to episode number 10 of the Knowledge Graph Insights podcast. I am really delighted today to welcome to the show Paco Nathan. Paco is the principal DevRel engineer for knowledge graphs at Senzing, the big company that does entity resolution for a large-scale mission-critical applications, really fancy high-end graph stuff.
So, welcome, Paco. Tell the folks a little bit more about what you're up to these days.
Paco:
Thank you very kindly, Larry. I appreciate. Yeah, I'm over at Senzing. Actually, I was presenting a master class about Senzing integrations at the Knowledge Graph conference last time we saw each other in Manhattan and then joined the company shortly thereafter.
Paco:
I'm building out the knowledge graph practice area because we do... I'm with this team that has been doing work for many years in entity resolution and most people have probably never heard of it, but most people have probably used it. So, the idea is, say you have a bunch of different tables or data sets and you want to try to find what are the consistent entities inside these tables.
Paco:
So, you might have Bob R. Smith and Bob Smith Jr. and they're both at 101 Main, but one of them is spelled 101 Main Street and the other is maybe spelled a different way or abbreviated a different way. And if you can think about that kind of problem, but spanning across billions of records in a lot of different data sources, how can you pull out the consistent entities?
Paco:
And it sounds like a trivial data science problem. We could just use string distance, Levenshtein distance, which is a typical thing. But when you take into account the fact of, well, what if you've got Bob R. Smith at 101 Main, Bob R. Smith Jr., but then you get Bob R. Smith Sr. at 101 Main, and they've both got voter registration. Is that the same person?
Paco:
Because your Levenshtein distance will tell you it is. If you set a threshold on string distance, they'll tell you they're the same person. So, when they try to register for vote, one of them will be denied voting rights. And so, this problem becomes very much complicated when you're working in a world where there are companies that have offshore subsidiaries and maybe you don't know the actual owners.
Paco:
You might know some of the directors and you get a very tangled web of some very bad people who are moving a lot of money around to do very bad things offshore, sorry, illegal fishing, illegal lumber, overthrowing democracies in Asia or in North America, for that matter. Basically, when you get the problem of trying to understand who's who and what's what, and a lot of different people or companies or ships that might have a registry somewhere, but you don't know exactly in a given business context who they are, how can you triangulate on them?
Paco:
And so, it's typically not a matter of just a string distance, it's a matter of, well, I have enough elements of their address that are in common even though there are five different ways to represent this address in Singapore. I can tell the difference between a company at the same address or a hundred companies that are in the same shopping mall, which actually in Singapore is really a hard problem to understand.
Paco:
And same thing for tax records or passport control. There's an area called UBO, which is ultimate beneficial owner, has a lot to do with sanctions compliance and catching oligarchs and understanding who is trying to do money laundering in an offshore tax haven, who is funneling billions of dollars out of Kremlin assets to try to influence a campaign somewhere. These are the kind of problems we work with.
Paco:
And so, the long and short is that these are... If you look at any episode of Homeland or The Wire or NCIS, any crime drama, inevitably, the protagonist goes up to a wall and they've got pincushion, they've got all these clippings and photos and notes, and they take yarn and draw a graph between them. And the thing is, the people who do that real work, if you're in the US, you're talking about three-letter agencies. If you're in the UK, you're talking about four-letter agencies.
Paco:
But the people who really do that work 24/7, they actually use knowledge graphs. They use collaborative knowledge graph tools like Aptitude Global, SiReN, GraphAware, Linkurious, Esri, ArcGIS Knowledge, Kineviz. There's a bunch of different tools that allow people to collaborate on building knowledge graphs to catch bad guys.
Paco:
In finance, we have acronyms like AML, anti-money laundering, or UBO, ultimate beneficial owner, or PEP, politically exposed persons. All of these things have to do with the fact that somebody has committed very large-scale crimes and governments have reacted by saying, "Okay, regulatory, we will not allow this to happen again." So, you end up having data sets like LIFE, was a multi-government response to the problems of 2009 global financial crisis.
The AI-powered Podcast Player
Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!


