The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)

Multimodal AI Models on Apple Silicon with MLX with Prince Canuma - #744

83 snips

Aug 26, 2025

Prince Canuma, an ML engineer and open-source developer known for his contributions to Apple's MLX ecosystem, discusses his journey in optimizing AI for Apple Silicon. He shares insights on adapting models, the trade-offs between GPU and Neural Engine, and innovative techniques like pruning and quantization for enhanced performance. Prince introduces 'Fusion,' a unique approach to model behavior without retraining, and presents Marvis, a real-time voice agent. His vision for future AI focuses on multimodal models that adapt seamlessly across various media.

Ask episode

AI Snips

Chapters

Transcript

Episode notes

INSIGHT

Fusion: Merge Behaviors Via Weights

Weight-space 'Fusion' finds strong behaviors in checkpoints and transfers them without fine-tuning.
Fusion can add new capabilities (e.g., function-calling) and often outperforms naive merging.

ADVICE

Practical Steps To Port A Model

To port a model to MLX, inspect the Hugging Face config to identify the model type and reuse an existing MLX model file.
Convert transformer components to MLX syntax and test with MLX's generate CLI before releasing.

ADVICE

Publish Multiple Quantized Builds

Produce multiple quantized variants (3–8 bit and BFloat16) so users can pick by RAM and speed.
Prioritize 4-bit and 3-bit checks: 4-bit generally works; 3-bit may break small or vision-sensitive models.

Get the Snipd Podcast app to discover more snips from this episode

Get the app