
Sustainable AI with Darshna Shah
RunAs Radio
00:00
Evaluating LLMs: metrics and challenges
Darshna and Richard examine LLM evaluation, Azure Evaluation SDK, F1/Rouge metrics, and limitations of benchmarks.
Play episode from 21:11
Transcript

Darshna and Richard examine LLM evaluation, Azure Evaluation SDK, F1/Rouge metrics, and limitations of benchmarks.