Center for Digital Research in the Humanities


Date of this Version


Document Type



Lorang, Elizabeth, and Leen-Kiat Soh. "Interim Report, HD-51897-14" (report to National Endowment for the Humanities, University of Nebraska-Lincoln, June 2015).


In the second six months of work on "Image Analysis for Archival Discovery," the project team has continued making strides toward our goal of analyzing more than 7 million newspaper pages in Chronicling America for poetic content. We have hit a few challenging areas in our research and development work, and our work plan has shifted in some ways from that originally set out in our application, but we have implemented these changes with the fundamental goal of performing the major research outlined in our proposal—exploring image analysis as a methodology for discovery in digitized collections of historic materials via a case study of identifying poetic content in historic newspapers.

Activities undertaken from December 2014–May 2015:

  • Development of an article describing the creation of our classifier for recognizing poetic content in historic newspapers; accepted and forthcoming in July/August 2015 D-Lib (completed)
  • Development of Python program for parsing Chronicling America JSON files and batch retrieving JPEG 2000 image files (completed)
  • Operationalization of entire process, from image retrieval to image processing, including moving to server environment (in progress)
  • Development of project documentation (completed to current stage)
  • Processing and classifying all Chronicling America images from the period 1836-1840 as a test case (in progress)
  • Communication of results with relevant audiences, such as at the American Literature Association conference and via project website (completed)
  • Pursuit of external partnership and additional source of funding: submission of Google Faculty Research Award application (completed; decision pending)