Machine Learning Street Talk (MLST)

Facebook Research - Unsupervised Translation of Programming Languages

Jun 24, 2020
Marie-Anne Lachaux, Baptiste Roziere, and Guillaume Lample are talented researchers at Facebook AI Research in Paris, specializing in the unsupervised translation of programming languages. They discuss their groundbreaking method that leverages shared embeddings and tokenization to improve programming language interoperability. The conversation highlights the balance between human insight and machine learning in coding, the challenges of structural differences in languages, and the collaborative culture that fuels innovation at FAIR.
Ask episode
AI Snips
Chapters
Transcript
Episode notes
INSIGHT

Shared Vocabulary

  • Shared vocabularies and word piece tokenization help align different languages in unsupervised translation.
  • Special language tokens guide the decoder to generate the correct target language.
ANECDOTE

Unsupervised Translator

  • The researchers trained an unsupervised translator for programming languages like Java, Python, and C++.
  • Previous methods were mostly rule-based, requiring extensive expertise and lacking generalizability.
INSIGHT

Anchor Points

  • Unsupervised translation of code relies on common tokens (anchor points) like keywords and variable names.
  • These anchor points are crucial for aligning cross-lingual representations.
Get the Snipd Podcast app to discover more snips from this episode
Get the app