Computer Science and Engineering, Department of


Date of this Version

Spring 3-29-2011


The exponentially increasing amount of data in file systems has made it increasingly important for users, administrators and applications to be able to fast retrieve files using file-search services, instead of replying on the standard file system API to traverse the hierarchical namespaces. The quality of the file-search services is significantly affected by the file-indexing overhead, the file-search performance and the accuracy of search results. Unfortunately, the existing file-search solutions either are so poorly scalable that their performance degrades unacceptably when the systems scale up, or incur so much crawling delays that they produce acceptably inaccurate results. We believe that the time is ripe for the re-designing of a searchable file system capable of accurate and scalable system-level file search.

The main challenge facing the design and implemention of such a searchable file system is how to update file indices in \emph{real-time} in a scalable way to obtain accurate file-search results. Updating file indices in hierarchical file systems or existing file-search solutions usually induces performance bottleneck and limits scalability. Thus we propose a lightweight, scalable and metadata organization, \emph{Propeller}, for future searchable file systems. Propeller partitions the namespace according to file-access patterns, which exposes massive parallelism for the emerging manycore architecture, and provides versatile system-level file-search functionalities, to support future searchable file systems. The extensive evaluation results of our \emph{Propeller} prototype show that it achieves significantly better file-index and file-search performance than a database-based solution (MySQL) and only incurs negligible overhead to the normal file I/O operations on a state-of-the-art file system (Ext4).