Statistics, Department of


Date of this Version



A DISSERTATION Presented to the Faculty of The Graduate College at the University of Nebraska In Partial Fulfillment of Requirements For the Degree of Doctor of Philosophy, Major: Statistics. Under the Supervision of Professor David B. Marx.
Lincoln, Nebraska: August 2009
Copyright (c) 2009 April Kerby


Researchers have been using clustering algorithms for many years to group similar observations based on a set of recorded characteristics. The majority of these algorithms maximize the similarity of the observations within a cluster, while at the same time maximize the dissimilarity with observations in other clusters. However, nearly all of the current clustering algorithms do not take into account the actual geographic location of the observation during the clustering process. This dissertation consists of three papers which propose a method to incorporate the geographical location of an observation into the clustering algorithm, known as spatial clustering.

The first paper examines spatial clustering when only one numeric response has been recorded for each observation. The geographic or spatial location is incorporated into the likelihood of the multivariate normal distribution through the variance-covariance matrix. The variance-covariance matrix is computed using any appropriate spatial covariance function, although the spherical covariance function was used for this research. The second paper extends the clustering algorithm to the multivariate case, i.e. when more than one response has been recorded on each observation. Again, the spatial location is incorporated through the variance-covariance matrix of the multivariate normal distribution. However, the actual construction of the variance-covariance matrix must take into account the cross-covariance between the variates. Oliver’s (2003) approach for modeling the cross-covariance is incorporated into the clustering algorithm.

Since not all recorded variables of interest are numeric, the third paper investigates incorporating categorical (non-numeric) responses into the spatial clustering algorithm. This paper looks first at the case where only categorical responses are recorded on the observations. After this has been implemented, the final step is to spatially cluster observations which contain both numeric and categorical responses. The algorithm must account for the spatial pattern of the data, the actual numeric responses and the categorical responses, and an appropriate weighting of the spatial component is determined. The final clustering algorithm clusters both numeric and categorical data while incorporating the geographic location of the observations.