Date of this Version
IN 1998 THE Albert R. Mann Library created the Cornell University Geospatial Information Repository (CUGIR), a web-based repository providing free access to geospatial data and metadata for New York State. Since its inception, CUGIR has undergone a series of changes and upgrades in response to emerging standards and technologies in the field of geospatial information systems (GIs) and digital library research. Its continuous adoption of new library and GIs standards and developments has made CUGIR increasingly more accessible to users within Cornell University and beyond.
The Cornell University Geospatial Information Repository has a number of characteristics that pose unique challenges for digital library developers. First, most GIs repositories manually distribute data and metadata via CD-ROM, whereas CUGIR freely distributes data and metadata via the World Wide Web, making it a true digital library. Second, it is rare to have a geospatial repository whose invention, support, and subsequent development occur within an academic research library. Academic GIs repositories or units are typically under the jurisdiction of urban planning, architecture, or geography departments. Because CUGIR is positioned in a library environment, it embraces standards and practices associated with the preservation, retrieval, acquisition, and organization of information. The library community has always been concerned with the archiving and version control of information, and believes that consistent application of standards will increase interoperability. The library community also believes that metadata, though costly and difficult to create and manage, adds value to whatever it describes. The CIS community is most concerned with creating data efficiently, easing the burden of metadata, and distributing data according to user requests. Generally speaking, GIs data are qualitatively different and more problematic than most digital library objects, including moving images. More importantly, perpetual updating, versioning, and "editioning" of data at the owner's request makes CIS data management and metadata management difficult. CUGIR reserves a position in two communities, library and GIs, requiring the CUGIR team to embrace the standards of both.
This sum of CUGIR's unique characteristics led the team to ask the following questions: if one were to create a perfect and heterogeneous metadata management system for a digital library, like CUGIR, what characteristics would it possess? How would it behave? What problems would it solve? The CUGIR team set out to create a system characterized by automatic metadata updating and digital object permanence. The system would be designed to behave in a predictable fashion, reduce work and costs, and increase access. The CUGIR metadata model is not a perfect metadata management system, but it is efficient. This is largely because it is a hybrid system embracing the standards, research, and practices of the library community while adopting the GIs community's most attractive feature, its software.
In striving for metadata management perfection, the CUGIR team became keenly aware of the shortcomings in the way GIs software manages digital objects and metadata, primarily the lack of version control for objects and preservation for metadata. Subsequently, these shortcomings were examined under the lens of the Functional Requirements for Bibliographic Records (FRBR) conceptual data. This set of requirements was sponsored by the International Federation of Library Associations and Institutions' (IFLA's) section on cataloging to address the changes in cataloging processes. The FRBR addresses three groups of entities, but for CUGIR's purposes the first group, which outlines the primary relationships between works, expressions, manifestations, and items, is most critical. In particular, FRBR's use of the concept work was examined in the context of CUGIR, and it was through this lens that the team began to view the differences among metadata surrogates or entities within CUGIR.
Similarly, the weaknesses of the typical digital library metadata model, particularly its disregard for automation, were addressed in two ways. First, the storage of surrogate records for multiple manifestations of the same expression was eliminated. Second, the automatic metadata-creation tools unique to GIs software applications were exploited to increase efficiency. These changes proved to be a step in the right direction toward improved management of heterogeneous metadata.
The purpose of this chapter is to introduce the CUGIR metadata management model, whose primary goal is access. This model specifically attempts to address the following problems that can hinder access:
1. Management of multiple metadata schemas, i.e., FGDC, MARC, and DC, that occur in multiple manifestations and expressions in CUGIR
2. The lack or absence of fixity and persistence or permanence of geospatial digital objects
3. The creation and maintenance of metadata that is typically difficult, costly, and time-consuming
4. The lack of tools to automate the creation and management of metadata, in particular, metadata synchronization
It was the goal of the CUGIR team to take the best of both worlds (digital libraries and GIs applications) and merge them to make a powerful system from which both communities could benefit. Although this model was chiefly designed for geospatial data and metadata, it can be applied to other types of digital libraries.