Machine Learning Street Talk (MLST)

chevron_right

#55 Self-Supervised Vision Models (Dr. Ishan Misra - FAIR).

whatshot 21 snips

Jun 21, 2021

Guest

Ishan Misra

Dr. Ishan Misra, a prolific Research Scientist at Facebook AI Research, dives into the world of self-supervised vision models. He discusses groundbreaking papers like DINO and Barlow Twins, addressing how these innovative approaches reduce the need for human supervision in visual learning. Ishan explores the nuances of neural networks, object recognition challenges, and the philosophical implications of AI's common sense knowledge. Plus, he compares self-supervised models with semi-supervised techniques, showcasing the advancements in harnessing human knowledge for enhanced machine learning.

01:36:21

forum

Ask episode

web_stories

AI Snips

view_agenda

Chapters

menu_book

Books

auto_awesome

Transcript

info_circle

Episode notes

insights

INSIGHT

Semantics of Similarity

Self-supervised models learn semantics of similarity, not language-induced semantics.
Attaching language to these similarity representations is needed for communication.

insights

INSIGHT

Dataset Bias

Datasets like ImageNet are biased towards object-centric images.
Random images would likely break current self-supervised systems.

volunteer_activism

ADVICE

Richer Labeling

Explore relationships between objects by asking how images are related or unrelated.
This richer labeling encodes human knowledge and object properties.

Get the Snipd Podcast app to discover more snips from this episode

Navigating Self-Supervised Learning in Vision Models

05:50 • 30min

chevron_right

The Nuances of Neural Networks and Conceptualization

35:21 • 19min

chevron_right

Comparing Self-Supervised and Semi-Supervised Learning Models

54:00 • 3min

chevron_right

Harnessing Human Knowledge in Machine Learning

56:40 • 6min

chevron_right

Advancements in Self-Supervised Learning

01:03:04 • 19min

chevron_right

Navigating Uncertainty in Vision Models

01:21:56 • 8min

chevron_right

Audio-Visual Learning Insights

01:30:24 • 6min

chevron_right

#60304

Semi-Supervised Learning of Visual Features by Non-Parametrically Predicting View Assignments with Support Samples

This paper introduces PAWS, a method for semi-supervised learning that combines labeled and unlabeled data to train models efficiently. PAWS minimizes a consistency loss to ensure similar pseudo-labels for different views of the same image, leveraging labeled samples non-parametrically. It achieves state-of-the-art results on ImageNet with significantly less training time.

#55285

Barlow Twins: Self-Supervised Learning via Redundancy Reduction

A Novel Approach to Self-Supervised Learning

Barlow Twins is a self-supervised learning approach that leverages redundancy reduction to create embeddings invariant to input distortions. It uses two identical neural networks to minimize redundancy between vector components, ensuring robust representations. This method is competitive with state-of-the-art self-supervised learning techniques.

#41688

FaceNet

A Unified Embedding for Face Recognition and Clustering

Florian Schroff

James Philbin

Dmitry Kalenichenko

FaceNet is a face recognition system that maps face images to a compact Euclidean space for efficient face recognition and clustering tasks. It was developed by researchers at Google and is known for its high accuracy in face recognition benchmarks.

#58178

Seeing in Depth

Volume 1: Basic Mechanics/ Volume 2: Depth Perception

Brian J. Rogers

Ian P. Howard

This book provides an in-depth exploration of how humans perceive depth, including topics such as depth contrast, stereopsis, and disparity. It is a valuable resource for those interested in visual psychology and neuropsychology.

#61371

SimCLR

A Framework for Contrastive Learning of Visual Representations

Chen, Kaiming He, and Jianbo Chen

SimCLR is a method for contrastive learning that maximizes agreement between differently augmented views of the same data example. It is used for self-supervised learning tasks in computer vision.

#76770

Dino

Emerging Properties in Self-Supervised Vision Transformers

Nick Tosches

#34307

• Mentioned in 2 episodes

Pause

a study of its nature and its rhythmical function in verse, especially blank verse

Ada Laura Fonda Snell

In 'Pause: A Study of Its Nature and Its Rhythmical Function in Verse, Especially Blank Verse', Ada Laura Fonda Snell explores the role of pauses in creating rhythm in poetry, particularly in blank verse. This work reflects her academic interest in the structural aspects of poetry and her use of innovative methods to analyze poetic rhythm.

#3362

• Mentioned in 15 episodes

The bitter lesson

National Union of Teachers.

This publication by the National Union of Teachers focuses on the issues of teacher turnover and the effects of the London Allowance. It presents a sample survey and analysis aimed at understanding the factors influencing teacher retention and the financial incentives provided by the London Allowance.

Dr. Ishan Misra is a Research Scientist at Facebook AI Research where he works on Computer Vision and Machine Learning. His main research interest is reducing the need for human supervision, and indeed, human knowledge in visual learning systems. He finished his PhD at the Robotics Institute at Carnegie Mellon. He has done stints at Microsoft Research, INRIA and Yale. His bachelors is in computer science where he achieved the highest GPA in his cohort.

Ishan is fast becoming a prolific scientist, already with more than 3000 citations under his belt and co-authoring with Yann LeCun; the godfather of deep learning. Today though we will be focusing an exciting cluster of recent papers around unsupervised representation learning for computer vision released from FAIR. These are; DINO: Emerging Properties in Self-Supervised Vision Transformers, BARLOW TWINS: Self-Supervised Learning via Redundancy Reduction and PAWS: Semi-Supervised Learning of Visual Features by Non-Parametrically Predicting View Assignments with

Support Samples. All of these papers are hot off the press, just being officially released in the last month or so. Many of you will remember PIRL: Self-Supervised Learning of Pretext-Invariant Representations which Ishan was the primary author of in 2019.

References;

Shuffle and Learn - https://arxiv.org/abs/1603.08561

DepthContrast - https://arxiv.org/abs/2101.02691

DINO - https://arxiv.org/abs/2104.14294

Barlow Twins - https://arxiv.org/abs/2103.03230

SwAV - https://arxiv.org/abs/2006.09882

PIRL - https://arxiv.org/abs/1912.01991

AVID - https://arxiv.org/abs/2004.12943 (best paper candidate at CVPR'21 (just announced over the weekend) - http://cvpr2021.thecvf.com/node/290)

Alexei (Alyosha) Efros

http://people.eecs.berkeley.edu/~efros/

http://www.cs.cmu.edu/~tmalisie/projects/nips09/

Exemplar networks

https://arxiv.org/abs/1406.6909

The bitter lesson - Rich Sutton

http://www.incompleteideas.net/IncIdeas/BitterLesson.html

Machine Teaching: A New Paradigm for Building Machine Learning Systems

https://arxiv.org/abs/1707.06742

POET

https://arxiv.org/pdf/1901.01753.pdf

Home Top podcasts Popular guests Top books