Libraries at University of Nebraska-Lincoln


Date of this Version

Fall 10-8-2020

Document Type



Heterogenous and voluminous unstructured data is produced from various sources like emails, social media tweets, reviews, videos, audio, images, PDFs, scanned documents, etc. Organizations need to store this wide range of unstructured data for more and longer periods so that they can examine information all the more profoundly to make a better decision and extracting useful insights. Manual processing of such unstructured data is always a challenging, time-consuming, and expensive task for any organization. Automating unstructured document processing using Optical Character Recognition (OCR) and Robotics Process Automation (RPA), seems to have limitations, as those techniques are driven by rules or templates. It needs to define the template or rules for every new input, which limits the use of rule or templates based techniques for unstructured document processing. These limitation demands to develop a tool which can be able to process the unstructured documents using Artificial Intelligence techniques. This bibliometric survey on Cognitive Document Processing reveals the mentioned facts about unstructured data processing challenges. This survey is performed on the Scopus database’s scientific documents. Various tools such as Microsoft Excel, Sciencescape, VOSviewer, Leximancer, and Gephi for drawing network data analysis diagrams are used. The study revealed that the largest number of publications on Cognitive Document Processing had been explored very recently. It is observed that universities/institutions in India are leading in the research studies focusing on this research topic.