Electrical & Computer Engineering, Department of


First Advisor

Eric Psota

Second Advisor

Lance Pérez

Date of this Version

Fall 12-5-2019


A THESIS Presented to the Faculty of The Graduate College at the University of Nebraska In Partial Fulfilment of Requirements For the Degree of Master of Science, Major: Electrical Engineering, Under the Supervision of Professors Eric T. Psota and Lance C. Pérez. Lincoln, Nebraska: December, 2019

Copyright 2019 Yanfeng Liu


This thesis extends upon the representational output of semantic instance segmentation by explicitly including both visible and occluded parts. A fully convolutional network is trained to produce consistent pixel-level embedding across two layers such that, when clustered, the results convey the full spatial extent and depth ordering of each instance. Results demonstrate that the network can accurately estimate complete masks in the presence of occlusion and outperform leading top-down bounding-box approaches.

The model is further extended to produce consistent pixel-level embeddings across two consecutive image frames from a video to simultaneously perform amodal instance segmentation and multi-object tracking. No post-processing trackers or Hungarian Algorithm is needed to perform multi-object tracking. The advantages and disadvantages of such a bounding-box-free approach are studied thoroughly. Experiments show that the proposed method outperforms the state-of-the-art bounding-box based approach on tracking animated moving objects.

Advisor: Eric T. Psota and Lance C. Pérez