Computing, School of

School of Computing: Technical Reports

FARMER: A Novel Approach to File Access Correlation Mining And Evaluation Reference Model for Optimizing Peta-Scale File System Performance

Peng Xia, Wuhan National Laboratory for Optoelectronics, China
Dan Feng, Wuhan National Laboratory for Optoelectronics, China
Hong Jiang, University of Nebraska-LincolnFollow
Lei Tian, University of Nebraska-LincolnFollow
Fang Wang, Wuhan National Laboratory for Optoelectronics, China

Date of this Version

1-22-2008

Document Type

Article

Comments

University of Nebraska–Lincoln, Computer Science and Engineering
Technical Report TR-UNL-CSE-2008-0001
Issued Jan. 22, 2008

Abstract

File correlations have become an increasingly important consideration for performance enhancement in peta-scale storage systems. Previous studies on file correlationsmainly concern with two aspects of files: file access sequence and semantic attribute. Based on mining with regard to these two aspects of file systems, various strategies have been proposed to optimize the overall system performance. Unfortunately, all of these studies consider either file access sequences or semantic attribute information separately and in isolation, thus unable to accurately and effectively mine file correlations, especially in large-scale distributed storage systems.
This paper introduces a novel File Access coRrelation Mining and Evaluation Referencemodel (FARMER) for optimizing peta-scale file system performance that judiciously considers both file access sequences and semantic attributes simultaneously to evaluate the degree of file correlations by leveraging the Vector Space Model (VSM) technique adopted from the Information Retrieval field. We extract the file correlation knowledge from some typical file system traces using FARMER, and incorporate FARMER into a real large-scale object-based storage system as a case study to dynamically infer file correlations and evaluate the benefits and costs of a FARMER-enabled prefetching algorithm for the metadata servers under real file system workloads. Experimental results show that FARMER can mine and evaluate file correlations more accurately and effectively. More significantly, the FARMER-enabled prefetching algorithm is shown to reduce the metadata server latency by approximately 24-35% when compared to a state-of-the-art metadata prefetching algorithm and a commonly used replacement policy.

Download

Included in

Computer Sciences Commons

COinS

Computing, School of

School of Computing: Technical Reports

FARMER: A Novel Approach to File Access Correlation Mining And Evaluation Reference Model for Optimizing Peta-Scale File System Performance

Date of this Version

Document Type

Comments

Abstract

Included in

Search

Browse

Author Corner

Links

Computing, School of

School of Computing: Technical Reports

FARMER: A Novel Approach to File Access Correlation Mining And Evaluation Reference Model for Optimizing Peta-Scale File System Performance

Authors

Date of this Version

Document Type

Comments

Abstract

Included in

Share

Search

Browse

Author Corner

Links