Libraries at University of Nebraska-Lincoln

 

Date of this Version

2010

Abstract

Tokenized word terms were collected from three sources: controlled vocabulary headings, user keyword searches, and html documents all dealing with issues in water quality. Distances were calculated between word pairs using the Jacquard formula. Distances from the three sources were compared using Spearman rank correlations and clusters were calculated on distances transformed for non-normality using the SAS pseudo-centroid method. Word pair distances from controlled vocabularies were more closely correlated to keyword searches than document distances were to users’ keywords. The mean distance of controlled vocabularies was also closer to that of users. Clusters produced from the three sources were most similar for word pairs with small distances.

Share

COinS
 
 

To view the content in your browser, please download Adobe Reader or, alternately,
you may Download the file to your hard drive.

NOTE: The latest versions of Adobe Reader do not support viewing PDF files within Firefox on Mac OS and if you are using a modern (Intel) Mac, there is no official plugin for viewing PDF files within the browser window.