Date of this Version
Clustering, the process of grouping together similar objects, is a fundamental task in data mining to help perform knowledge discovery in large datasets. With the growing number of sensor networks, geospatial satellites, global positioning devices, and human networks tremendous amounts of spatio-temporal data that measure the state of the planet Earth are being collected every day. This large amount of spatio-temporal data has increased the need for efficient spatial data mining techniques. Furthermore, most of the anthropogenic objects in space are represented using polygons, for example – counties, census tracts, and watersheds. Therefore, it is important to develop data mining techniques specifically addressed to mining polygonal data. In this research we focus on clustering geospatial polygons with fixed space and time coordinates.
Polygonal datasets are more complex than point datasets because polygons have topological and directional properties that are not relevant to points, thus rendering most state-of-the-art point-based clustering techniques not readily applicable. We have addressed four important sub-problems in polygonal clustering. (1) We have developed a dissimilarity function that integrates both non-spatial attributes and spatial structure and context of the polygons. (2) We have extended DBSCAN, the state-of-the-art density based clustering algorithm for point datasets, to polygonal datasets and further extended it to handle polygonal obstacles. (3) We have designed a suite of algorithms that incorporate user-defined constraints in the clustering process. (4) We have developed a spatio-temporal polygonal clustering algorithm that uniquely treats both space and time as first-class citizens, and developed an algorithm to analyze the movement patterns in the spatio-temporal polygonal clusters. In order to evaluate our algorithms we applied our algorithms on real-life datasets from several diverse domains to solve practical problems such as congressional redistricting, spatial epidemiology, crime mapping, and drought analysis. The results show that our algorithms are effective in finding spatially compact and conceptually coherent clusters.