Date of this Version
Department of Computer Science & Engineering, University of Nebraska-Lincoln, Technical Report, 2013.
Big-data/HPC analytics applications have urgent needs for file-search services to drastically reduce the scale of the input data to accelerate analytics. Unfortunately, the existing solutions either are poorly scalable for large-scale systems, or lack well-integrated interface to allow applications to easily use them. We propose a distributed searchable file system, VSFS, which provide a novel and flexible POSIX-compatible searchable file system namespace that can be seamlessly integrate with any legacy code without modification. Additionally, to provide real-time indexing and searching performance, VSFS uses DRAM-based distributed consistent hashing ring to manages all file-index. The results of our evaluation show that VSFS is scalable in HPC environment. It achieves significant better file-indexing and file-searching performance than the popular SQL/NoSQL solutions, while it only introduces negligible overheads to raw I/O performance. Finally, we integrate the VSFS to a scientific analytic application to show its benefits in terms of performance and convenience.