
Deep Papers LibreEval: The Largest Open Source Benchmark for RAG Hallucination Detection
12 snips
Apr 18, 2025 AI Snips
Chapters
Transcript
Episode notes
Types of Hallucinations
- Synthetic hallucinations are generated by instructing models to hallucinate; non-synthetic ones occur naturally.
- Data shows models tend to hallucinate more with relation errors and incompleteness rather than entity errors.
LLM Judges Surpass Humans
- An LLM judge labeling hallucinations was found to be more accurate than human labelers in tested samples.
- Providing human labelers with LLM council output improved human accuracy by aligning their judgments.
Fine-Tuning Boosts Smaller Models
- Fine-tuning small models can boost their hallucination detection performance close to or beyond large LLMs.
- Models perform better detecting synthetic hallucinations than naturally occurring ones.
