Date of this Version
The recent years have witnessed an upsurge in the number of published documents. Organizations are showing an increased interest in text classification for effective use of the information. Manual procedures for text classification can be fruitful for a handful of documents, but the same lack in credibility when the number of documents increases besides being laborious and time-consuming. Text mining techniques facilitate assigning text strings to categories rendering the process of classification fast, accurate, and hence reliable. This paper classifies chemistry documents using machine learning and statistical methods. The procedure of text classification has been described in chronological order like data preparation followed by processing, transformation, and application of classification techniques culminating in the validation of the results.