Date of this Version
Lorang, Elizabeth, and Leen-Kiat Soh. "White Paper, HD-51897-14" (white paper for National Endowment for the Humanities, University of Nebraska-Lincoln, October 2016).
With its Office of Digital Humanities Start-up Grant, the Image Analysis for Archival Discovery (Aida) team set out to further develop image analysis as a methodology for the identification and retrieval of items of relevance within digitized collections of historic materials.1 Specifically, we sought to identify poetic content within historic newspapers, using Chronicling America's newspapers (http://chroniclingamerica.loc.gov/) as our test case. The project activities we undertook—both those completed and those in process—support this goal and align well with the activities proposed in our original funding application and as approved by NEH. To achieve our goal of creating an image processing-based system to identify poetic content in historic newspaper collections, however, we also made strategic decisions along the way that shifted some of our efforts from those we initially planned when we drafted our funding proposal three years ago.
During the grant period, the Aida team developed, trained, and tested a machine learning classifier that can identify poetic content in pages of digitized historic newspapers based only on visual signals. We published early results of this work in D-Lib Magazine in summer 2015. We have since undertaken a detailed case study that tests the application of our classifier and methodology to a test set of more than 22,000 newspaper page images from the period 1836-1840. Significantly, we shifted our emphasis from processing all pages from Chronicling America to conducting this thorough, critical analysis and case study. This shift in plans corresponds with our desire to explore image analysis as a methodology for connecting users of digital archives with materials of relevance.