Libraries at University of Nebraska-Lincoln

 

Date of this Version

3-2015

Citation

Lorang, Elizabeth, Leen-Kiat Soh, Maanas Varma Datla, and Spencer Kulwicki. "Developing an Image-Based Classifier for Detecting Poetic Content in Historic Newspaper Collections." Preprint, submitted March 19, 2015.

Comments

This is the final authors' copy of this manuscript prior to publication in D-Lib Magazine (July/August 2015), http://www.dlib.org/dlib/july15/lorang/07lorang.html

Abstract

"Developing an Image-Based Classifier for Detecting Poetic Content in Historic Newspaper Collections" details and analyzes the first stage of work of the Image Analysis for Archival Discovery project team. Our team is is investigating the use of image analysis to identify poetic content in historic newspapers. The project seeks both to augment the study of literary history by drawing attention to the magnitude of poetry published in newspapers and by making the poetry more readily available for study, as well as to advance work on the use of digital images in facilitating discovery in digital libraries and other digitized collections. We have recently completed the process of training our classifier for identifying poetic content, and as we prepare to move in to the deployment stage, we are making available our methods for classification and testing in order to promote further research and discussion. The precision and recall values achieved during the training (90.58%; 79.4%) and testing (74.92%; 61.84%) stages are encouraging. In addition to discussing why such an approach is needed and relevant and situating our project alongside related work, this paper analyzes preliminary results, which support the feasibility and viability of our approach to detecting poetic content in historic newspaper collections.