Interconnects

Interviewing OLMo 2 leads: Open secrets of training language models

11 snips
Jan 22, 2025
Luca Soldaini, the Data lead for the OLMo project at AI2, joins the discussion to unveil the intricacies of training language models. He shares tales of overcoming challenges in pretraining efficiency and the quest for stability, especially after a significant 70B model attempt. The conversation dives into the strategic decisions behind building effective language modeling teams, the intricate balance of deep versus wide network architectures, and the importance of community-driven advancements in AI.
Ask episode
AI Snips
Chapters
Books
Transcript
Episode notes
ANECDOTE

Learning from Failure

  • The team's 70B model failure, while initially disheartening, proved invaluable.
  • The subsequent investigation led to significant improvements in OLMo 2.
INSIGHT

OLMo's Role in Research

  • OLMo aims to facilitate research on overlooked aspects of language models.
  • It enables exploration of fundamental capabilities, like the pointing capability in OLMo-Mo, that other models lack.
ANECDOTE

Experimenting with MUP

  • The team explored the "MUP" method for setting hyperparameters but encountered limitations.
  • While effective for scaling across model sizes, it did not work for varying training run lengths.
Get the Snipd Podcast app to discover more snips from this episode
Get the app