
The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence) Multimodal AI Models on Apple Silicon with MLX with Prince Canuma - #744
83 snips
Aug 26, 2025 Prince Canuma, an ML engineer and open-source developer known for his contributions to Apple's MLX ecosystem, discusses his journey in optimizing AI for Apple Silicon. He shares insights on adapting models, the trade-offs between GPU and Neural Engine, and innovative techniques like pruning and quantization for enhanced performance. Prince introduces 'Fusion,' a unique approach to model behavior without retraining, and presents Marvis, a real-time voice agent. His vision for future AI focuses on multimodal models that adapt seamlessly across various media.
AI Snips
Chapters
Transcript
Episode notes
Fusion: Merge Behaviors Via Weights
- Weight-space 'Fusion' finds strong behaviors in checkpoints and transfers them without fine-tuning.
- Fusion can add new capabilities (e.g., function-calling) and often outperforms naive merging.
Practical Steps To Port A Model
- To port a model to MLX, inspect the Hugging Face config to identify the model type and reuse an existing MLX model file.
- Convert transformer components to MLX syntax and test with MLX's generate CLI before releasing.
Publish Multiple Quantized Builds
- Produce multiple quantized variants (3–8 bit and BFloat16) so users can pick by RAM and speed.
- Prioritize 4-bit and 3-bit checks: 4-bit generally works; 3-bit may break small or vision-sensitive models.
