Libraries at University of Nebraska-Lincoln

 

ORCID IDs

Kent M. Eskridge

Date of this Version

2010

Abstract

Tokenized word terms were collected from three sources: controlled vocabulary headings, user keyword searches, and html documents all dealing with issues in water quality. Distances were calculated between word pairs using the Jacquard formula. Distances from the three sources were compared using Spearman rank correlations and clusters were calculated on distances transformed for non-normality using the SAS pseudo-centroid method. Word pair distances from controlled vocabularies were more closely correlated to keyword searches than document distances were to users’ keywords. The mean distance of controlled vocabularies was also closer to that of users. Clusters produced from the three sources were most similar for word pairs with small distances.

Share

COinS