Computer Science and Engineering, Department of

 

First Advisor

Mohammad Rashedul Hasan

Date of this Version

Spring 5-2021

Citation

@article{ssl-representations,author={Atharva Tendle},title={"THE REVOLUTION WILL NOT BE SUPERVISED": AN INVESTIGATION OF THE EFFICACY AND REASONING PROCESS OF SELF-SUPERVISED REPRESENTATIONS}, year = {2021}}

Comments

A THESIS Presented to the Faculty of The Graduate College at the University of Nebraska In Partial Fulfillment of Requirements For the Degree of Master of Science, Major: Computer Science, Under the Supervision of Professor Mohammad Rashedul Hasan. Lincoln, Nebraska: 2021

Copyright 2021 Atharva Tendle

Abstract

Transfer learning technique enables training Deep Learning (DL) models in a data-efficient way for solving computer vision tasks. It involves pretraining a DL model to learn representations from a large and general-purpose source dataset, then fine-tuning the model using the task-specific target dataset. The dominant supervised learning (SL) approach for pretraining representations suffers from some limitations that include expensive labeling and poor generalizability. Recent advancements in the self-supervised learning (SSL) approach made it possible to learn effective representations from unlabeled data. The performance of the fine-tuned DL models based on pretrained SSL representations is on par with the state-of-the-art pretrained SL representation-based fine-tuned models. However, no study has been done to determine the generalizability of SSL representations on various target domains as well as to understand the science of its efficacy. In this thesis, we conduct a multi-dimensional investigation on the SSL approach for pretraining representations. We identify the space of SSL's excellence by investigating various SSL techniques on two types of target datasets: target dataset that is similar to the source dataset used to create the representations, and target dataset that is significantly different from the source dataset. For the latter type, we use the camera trap dataset that assembles various information on wildlife populations. In addition to this, we explain the effectiveness of SSL representations by two techniques: group symmetry-based analysis (e.g., invariance to various transformations) and feature visualization-based analysis. We design and conduct an extensive study for this investigation. The main contribution of this thesis is three-fold: (i) We achieve the new state-of-the-art benchmark on a large camera trap dataset using the SSL-based approach. (ii) We analyze the effectiveness and efficiency of the main SSL techniques against the dominant SL technique for diverse domains. (iii) We provide an interpretability study framework for the SSL representations. Using this framework, we offer insights on the generalizability of the SSL representations as well as how SSL models reason about the semantic identity of the target data used in a classification task.

Adviser: Mohammad Rashedul Hasan

Share

COinS