Computer Science and Engineering, Department of


Date of this Version

Fall 12-1-2012


A THESIS Presented to the Faculty of The Graduate College at the University of Nebraska In Partial Fulfillment of Requirements For the Degree of Master of Science, Major: Computer Science, Under the Supervision of Professor Lisong Xu and Professor Dong Wang. Lincoln, Nebraska: December, 2012

Copyright (c) 2012 Juan Shao


Recently, many new TCP algorithms, such as BIC, CUBIC, and CTCP, have been deployed in the Internet. Investigating the deployment statistics of these TCP algorithms is meaningful to study the performance and stability of the Internet. Currently, there is a tool named Congestion Avoidance Algorithm Identification (CAAI) for identifying the TCP algorithm of a web server and then for investigating the TCP deployment statistics. However, CAAI using a simple k-NN algorithm can not achieve a high identification accuracy. In this thesis, we comprehensively study the identification accuracy of five popular machine learning models. We find that the random forest model achieves the highest identification accuracy among these five models, and its identification accuracy is much higher than that of CAAI.

Adviser: Lisong Xu