Computer Science and Engineering, Department of


Date of this Version

Spring 1-31-2014


Copyright (c) 2014 Lei Xu, Ziling Huang, Hong Jiang, Lei Tian, and David Swanson.


The enormous amount of big data datasets impose the needs for effective data filtering technique to accelerate the analytics process. We propose a Versatile Searchable File System, VSFS, which provides a transparent, flexible and near real-time file-level data filtering service by searching files directly through the file system. Therefore, big data analytics applications can transparently utilize this filtering service without application modifications. A versatile index scheme is designed to adapt to the exploratory and ad-hoc nature of the big data analytics activities. Moreover, VSFS uses a RAM-based distributed architecture to perform file indexing. The evaluations driven by three real-world analytics applications demonstrate VSFS’ high scalability and data-filtering capability.