Center for Digital Research in the Humanities


Date of this Version


Document Type



Lorang, Elizabeth, Leen-Kiat Soh, and John O'Brien. "Interim Performance Report, LG‐71‐16‐0152‐16, Extending Intelligent Computational Image Analysis for Archival Discovery, March 2019" (report to Institute of Museum and Library Services, March 2019).


The primary goal of "Extending Intelligent Computational Image Analysis for Archival Discovery" is to investigate the use of image analysis as a methodology for content identification, description, and information retrieval in digital libraries and other digitized collections. Building on work started under a National Endowment for the Humanities' Office of Digital Humanities Start-up Grant, our IMLS project seeks to 1) analyze and verify our previously developed image analysis approach and extend it so that it is newspaper agnostic, type agnostic, and language agnostic; 2) scale and revise the intelligent image analysis approach and determine the ideal balance between precision and recall for this work; 3) distribute metadata and develop a new digital collection using the extracted content; and 4) disseminate results, including adding to the scholarly literature on these topics and providing training for members of library and archive communities. In the second year of the project, the Aida team made considerable headway in the goals of our grant. While we have continued to focus exclusively on poetic content to this point, year two was an important year for assessing the efficacy of the approach and extending it such that it might be newspaper- and language-agnostic. In addition, we assembled a large set of data and evidence to help us consider the balance of precision and recall as well as to consider revisions to the overall approach given what we’re learning in this area. We also have a functional metadata model and have made major steps toward developing a new digital collection out of the poetic content observed during the project, and for distributing metadata about the content. Finally, team members shared about the work at four major conferences, to audiences of digital library professionals and specialists and literary scholars. Team members prepared three publications, which are currently out for review, a detailed report analyzing the extension of the approach to a new corpus and have generated notes toward additional articles and other writing for year 3.