A Survey on Self-supervised Visual Representation Learning with Vision Transformer

PhD Qualifying Examination


Title: "A Survey on Self-supervised Visual Representation Learning with 
Vision Transformer"

by

Mr. Kai CHEN


Abstract:

Self-supervised visual representation learning aims at pre-training a 
representation backbone network from pseudo labels automatically generated 
from unlabeled images, without dependence on human annotations such as 
semantic class labels and image captioning. Previous methods in the CNN 
era have been dominated by instance discrimination, while with the 
development of Vision Transformer, novel pretext tasks represented by 
masked image modeling have demonstrated potential for more superior 
transfer performance. In this survey, we provide a comprehensive review of 
the self-supervised visual representation learning methods with Vision 
Transformer. Specifically, we first formulate self-supervised learning 
with a unified objective for both instance discrimination and masked image 
modeling and provide a brief introduction to Vision Transformer. After 
that, we conduct a throughout review of the two mainstream self-supervised 
pretext tasks with an in-depth analysis of the challenges and differences. 
Finally, we conclude by discussing several potential research directions.


Date:  			Tuesday, 9 August 2022

Time:                  	2:00pm - 4:00pm

Zoom Meeting: 
https://hkust.zoom.us/j/93921451190?pwd=SW1zYWlseFBXWW5VRHBqbGFFRHJmdz09

Committee Members:	Prof. Dit-Yan Yeung (Supervisor)
 			Prof. Raymond Wong (Chairperson)
 			Dr. Dan Xu
 			Dr. Zhiqiang Shen


**** ALL are Welcome ****