Computer Science and Engineering, Department of


Date of this Version



University of Nebraska–Lincoln, Computer Science and Engineering
Technical Report TR-UNL-CSE-2008-0012
Issued Nov. 9, 2008


Existing data storage systems based on hierarchical directory tree do not meet scalability and functionality requirements for exponentially growing datasets and increasingly complex metadata queries in large-scale file systems with billions of files and Exabytes of data. This paper proposes a novel decentralized semantic-aware metadata organization, called SmartStore, which exploits metadata semantics of files to judiciously aggregate correlated files into semantic-aware groups by using information retrieval tools. The decentralized design of SmartStore can improve system scalability and reduce query latency for both complex queries (including range and top-k queries), which is helpful to construct semantic-aware caching, and conventional filename-based point query. The key idea of SmartStore is to limit search scope of a complex metadata query to a minimal number of semantically related groups and avoid or alleviate brute-force search in entire system. Extensive experiments based on real-world traces show that SmartStore significantly improves system scalability and reduces query latency by more than one thousand times faster than current database approaches. To the best of our knowledge, this is the first paper addressing complex queries in large-scale file systems.