Computer Science and Engineering, Department of


Date of this Version



G. Nagy and S. Seth,"Twenty questions for document classification", Document Layout Interpretation and its Applications Workshop, Seattle, 2001 , Invited lead presentation to the workshop.


Documents – manuscripts, books, magazines, newspapers, sheet music, circuit diagrams, checks, web pages, email attachments, music- CDs, videos, and cuneiform - mirror the culture of the time and serve as the primary source of historical record. Although it seems natural to classify documents according to "format" before examining their content, form and function are often intertwined. The design of a document interpretation system must take both into consideration.

What are the essential parameters of a document interpretation system? What needs to be known before undertaking the design or purchase of such a system? What is the interrelationship of the client, the document, and the desired information? In other words, what is the range of issues of possible interest to our research community? In order to highlight the tacit assumptions implicit in the document analysis literature, we will start with tabula rasa and invite the workshop participants to join us in a game of Twenty Questions.