Electrical & Computer Engineering, Department of
First Advisor
Eric Psota
Second Advisor
Lance Pérez
Date of this Version
Fall 12-5-2019
Document Type
Article
Abstract
This thesis extends upon the representational output of semantic instance segmentation by explicitly including both visible and occluded parts. A fully convolutional network is trained to produce consistent pixel-level embedding across two layers such that, when clustered, the results convey the full spatial extent and depth ordering of each instance. Results demonstrate that the network can accurately estimate complete masks in the presence of occlusion and outperform leading top-down bounding-box approaches.
The model is further extended to produce consistent pixel-level embeddings across two consecutive image frames from a video to simultaneously perform amodal instance segmentation and multi-object tracking. No post-processing trackers or Hungarian Algorithm is needed to perform multi-object tracking. The advantages and disadvantages of such a bounding-box-free approach are studied thoroughly. Experiments show that the proposed method outperforms the state-of-the-art bounding-box based approach on tracking animated moving objects.
Advisor: Eric T. Psota and Lance C. Pérez
Comments
A THESIS Presented to the Faculty of The Graduate College at the University of Nebraska In Partial Fulfilment of Requirements For the Degree of Master of Science, Major: Electrical Engineering, Under the Supervision of Professors Eric T. Psota and Lance C. Pérez. Lincoln, Nebraska: December, 2019
Copyright 2019 Yanfeng Liu