Paper ID: 2307.16288

Towards Learned Predictability of Storage Systems

Chenyuan Wu

With the rapid development of cloud computing and big data technologies, storage systems have become a fundamental building block of datacenters, incorporating hardware innovations such as flash solid state drives and non-volatile memories, as well as software infrastructures such as RAID and distributed file systems. Despite the growing popularity and interests in storage, designing and implementing reliable storage systems remains challenging, due to their performance instability and prevailing hardware failures. Proactive prediction greatly strengthens the reliability of storage systems. There are two dimensions of prediction: performance and failure. Ideally, through detecting in advance the slow IO requests, and predicting device failures before they really happen, we can build storage systems with especially low tail latency and high availability. While its importance is well recognized, such proactive prediction in storage systems, on the other hand, is particularly difficult. To move towards predictability of storage systems, various mechanisms and field studies have been proposed in the past few years. In this report, we present a survey of these mechanisms and field studies, focusing on machine learning based black-box approaches. Based on three representative research works, we discuss where and how machine learning should be applied in this field. The strengths and limitations of each research work are also evaluated in detail.

Submitted: Jul 30, 2023