Libraries at University of Nebraska-Lincoln


Date of this Version



A lot of text data is being generated on the web in the form of scholarly articles, doctoral thesis, social media, library databases, and data archives. They are easy to use but complicated to process for research works. That is exactly why text mining is required and topic modeling is one of the most important techniques involved in text mining. In this paper, an attempt has been made to discover topics from the thesis titles (uploaded theses) in the field of Library and Information Science (LIS). For this work, the text data (n=2132) has been obtained from the Shodhganga. Then, topic modeling through Latent Dirichlet Allocation (LDA) has been applied. After employing preliminary investigation, the findings show: State universities of India have the highest contribution of the thesis (78.06%); most theses (106) belong to Karnatak University, and 60.83% of thesis falls under the period 2011-2020. The main results of this paper are (a) The keyword “library” (0.204) has the highest score regarding 10 topics and “Library use” can be inferred as the major topic; (b) the keywords “information”, “technology”, “communication”, “survey”, “comparative”, “plant”, “scientist”, “city”, “support”, and “small” were discussed over 266 titles; and (c) “study”, “university libraries”, and “information-seeking behaviour’ are the most frequent n-grams appeared in the titles. This work can be taken towards future research for more improvement and new applications.