File System Benchmarking Tools and Techniques
Benchmarking is critical when evaluating performance, but is
especially difficult for file and storage systems. Complex
interactions between I/O devices, caches, kernel daemons, and other OS
components result in behavior that is rather difficult to analyze.
Moreover, systems have different features and optimizations, so no
single benchmark is always suitable. The large variety of workloads
that these systems experience in the real world also add to this
difficulty.
We have found that some of the most commonly used benchmarks are flawed, and
many research papers do not provide a clear enough picture of file system
performance. We believe that a good performance evaluation should use
micro-benchmarks to highlight both the good and bad qualities of a file
system, as well as general-purpose benchmarks or traces to give an idea
about how it would perform under expected and realistic workloads.
Nevertheless, care should be taken to ensure that general-purpose benchmarks
indeed accurately reflect the real-world workloads. In addition, benchmarks
should scale well, and results should be reproducible and comparable across
papers.
In this project, we survey file system benchmarks used in many recent
research papers. We found that no single benchmark adequately measures file
system performance. We show how some commonly acceptable and widely used
benchmarks and benchmarking techniques can easily conceal overheads,
unfairly over-emphasize overheads, or can in general emphasize or
de-emphasize many of the file system's properties. We offer suggestions on
how to create and conduct benchmarks so that they provide a more fair and
accurate picture of file system performance.
Primarily in this project, we describe our views on the future of file
system benchmarking. To that end, we have been developing several
technologies: fine-grained file system
tracing, efficient file system replaying, automated file system benchmarking
tools, and low-overhead detailed file system behavior visualization
tools.
Current Students:
Past Students:
| # |
Name (click for home page) |
Program |
Period |
Current Location |
| 1 |
Nikolai Joukov |
PhD |
Jan 2004 - Dec 2006 |
Research Staff Member, Storage and Data Services Research group, IBM T. J. Watson Research Center (Hawthorne, NY) |
| 2 |
Avishay Traeger |
PhD |
Sep 2003 - Aug 2008 |
Research Staff Member, Storage Systems and Performance Management group, IBM Tel Aviv Research Lab (Tel-Aviv, Israel) |
| 3 |
Charles P. Wright |
PhD |
May 2003 - May 2006 |
Research Staff Member, Network Server Systems Software group, IBM T. J. Watson Research Center (Hawthorne, NY) |
| 4 |
Akshat Aranya |
MS |
May 2003 - Aug 2004 |
Associate Research Staff Member, NEC Labs America (Princeton, New Jersey) |
| 5 |
Tim Wong |
BS |
Dec 2004 - Jun 2005 |
Research/Portfolio Management Analyst, Global Stock Selection (GSS) group, Applied Quantitative Research (Greenwich, CT) |
Sponsors:
(Last updated: Tue Dec 9 20:15:20 EST 2008)