TornadoVM: The Need for GPU Speed

Jul 6, 2025

Michalis Papadimitriou, an expert in GPU acceleration and compiler optimizations for AI and Java, shares fascinating insights from his journey in tech. He discusses how he achieved up to 20x speedups in Java applications by leveraging OpenCL and TornadoVM. Hear about his work at Huawei and how he is optimizing AI frameworks like Llama 3, emphasizing the importance of standardizing ML model formats. With a focus on enhancing GPU processing in Java, he highlights kernel fusion techniques and the exciting potential of Graal VM in the modern developer landscape.

Ask episode

AI Snips

Chapters

Books

Transcript

Episode notes

ANECDOTE

AI Startup and Compiler Optimization

Michalis worked at AI startup OctoAI optimizing AI compilers for TensorFlow and PyTorch models, handling operators and kernel fusion for performance.
The startup was later acquired by NVIDIA, and he returned to TornadoVM with new AI knowledge.

INSIGHT

Java Powers GPU LLaMA Inference

TornadoVM now runs GPU-accelerated LLaMA 3 inference in pure Java achieving 3-6x speedup over CPU implementations on NVIDIA GPUs.
This proves Java can express efficient GPU computation, not just via external libraries but with core Java APIs.

INSIGHT

Models to Hardware Kernels

LLM models’ operators like matrix multiplication are represented as graph operators that TornadoVM compiles into hardware-specific kernels.
This model-to-kernel mapping enables flexible, efficient GPU execution by TornadoVM.

Get the Snipd Podcast app to discover more snips from this episode

Get the app