Date of this Version
Landing an unmanned aerial vehicle (UAV) on a moving platform is a challenging task that often requires exact models of the UAV dynamics, platform characteristics, and environmental conditions. In this thesis, we present and investigate three different machine learning approaches with varying levels of domain knowledge: dynamics randomization, universal policy with system identification, and reinforcement learning with no parameter variation. We first train the policies in simulation, then perform experiments both in simulation, making variations of the system dynamics with wind and friction coefficient, then perform experiments in a real robot system with wind variation. We initially expected that providing more information on environmental characteristics with system identification would improve the outcomes, however, we found that transferring a policy learned in simulation with domain randomization to the real robot system achieves the best result in the real robot and simulation. Although in simulation the universal policy with system identification is faster in some cases. In this thesis, we compare the results of multiple deep reinforcement learning approaches trained in simulation and transferred in robot experiments with the presence of external disturbances. We were able to create a policy to control a UAV completely trained in simulation and transfer to a real system with the presence of external disturbances. In doing so, we evaluate the performance of dynamics randomization and universal policy with system identification.
Adviser: Carrick Detweiler