
airhacks.fm podcast with adam bien TornadoVM: The Need for GPU Speed
Jul 6, 2025
Michalis Papadimitriou, an expert in GPU acceleration and compiler optimizations for AI and Java, shares fascinating insights from his journey in tech. He discusses how he achieved up to 20x speedups in Java applications by leveraging OpenCL and TornadoVM. Hear about his work at Huawei and how he is optimizing AI frameworks like Llama 3, emphasizing the importance of standardizing ML model formats. With a focus on enhancing GPU processing in Java, he highlights kernel fusion techniques and the exciting potential of Graal VM in the modern developer landscape.
AI Snips
Chapters
Books
Transcript
Episode notes
AI Startup and Compiler Optimization
- Michalis worked at AI startup OctoAI optimizing AI compilers for TensorFlow and PyTorch models, handling operators and kernel fusion for performance.
- The startup was later acquired by NVIDIA, and he returned to TornadoVM with new AI knowledge.
Java Powers GPU LLaMA Inference
- TornadoVM now runs GPU-accelerated LLaMA 3 inference in pure Java achieving 3-6x speedup over CPU implementations on NVIDIA GPUs.
- This proves Java can express efficient GPU computation, not just via external libraries but with core Java APIs.
Models to Hardware Kernels
- LLM models’ operators like matrix multiplication are represented as graph operators that TornadoVM compiles into hardware-specific kernels.
- This model-to-kernel mapping enables flexible, efficient GPU execution by TornadoVM.

