Hard Drive Remaining Useful Life Estimation in the Data Center: Comparing Ensemble Learning with…

Source: Deep Learning on Medium


The prediction of systems and/or components failure is of interest in many industries including aerospace, agriculture, energy, manufacturing, and technology. The structure of the analysis may vary depending on analytical or business goals and the availability of data. One option is to frame the analysis as a regression problem, and estimate the component’s Remaining Useful Life (RUL) using time series data. This technique assumes that there is sufficient information per incremental time step to generate an RUL estimate, and that the degradation of the component is somewhat smooth, as opposed to sudden failure.

The rise of the Internet of Things (IOT) and the availability of machine sensor data has contributed to the increase in the applicability of RUL estimation in real-world settings. In this light, the prediction of hard-disk drive (HDD) RUL stands to provide benefit to many organizations that rely on data storage services ― HDDs are among the most frequently failing components in data centers today (something I became initmately familiar with while processing large-scale datasets in an enterprise data center). Most modern enterprise-class HDDs include a standard self-monitoring system, aptly named Self-Monitoring, Analysis and Reporting Technology (SMART), that records real-time sensor data for the purpose of detecting malfunctions and anticipating system failures. The current study aims to compare Random Forest (RF), as an Ensemble Learning approach, with Deep Convolutional Neural Network (DCNN) techniques to predict the RUL of HDDs using real-world operation data from the Backblaze data center.


The Backblaze hard drive dataset consists of daily SMART readings from over 100,000 active hard drives, including HDD and Solid State Storage (SSD) devices of various brands and models. Seagate (ST) and Hitachi Global Storage Technologies (HGST) HDDs are the most frequently occurring devices in the dataset, and together comprise over 97% of the total active hard drives in the data center as recently as 1Q19. The current study chooses to focus on the Seagate model ST4000DM000, since this model is consistently among the most likely to fail, the oldest, and one of the most common models in the dataset.

The data is preprocessed prior to building statistical learning models. Feature columns with more than 25% missing values are deemed incomplete and are excluded from the dataset prior to model building. In addition, a few pairs of features are found to contain redundant information (i.e., 100% correlated with one another), in these cases one feature of the pair is excluded, and the other is retained for model building. Missing values are identified and corrected for using linear interpolation. In cases where seven missing time series values occur in a row, the data is not interpolated and the HDD is dropped from the dataset.

The dataset is found to be heavily imbalanced in terms of HDDs that have failed prematurely versus those that are still active in the data center. For example, within a time window of 18 months (i.e., 4Q17 to 1Q19 inclusive), only 847 out of 31,504 (i.e., 2.6%) ST4000DM000 HDDs were labeled as failed. In order to overcome the imbalance in the dataset, the data is randomly down sampled until the number of active (not failed) HDDs is equal to the number of failed, within the time window of 18 months.

An intrinsic problem in RUL regression analysis is the determination of the desired output for each input observation. Previous studies related to RUL estimation point to the reasonable proposition that a linearly degrading RUL target function may embed unrealistic assumptions into the predictive model; i.e., degradation occurs linearly with time beginning at the first day of device installation. In fact, device degradation may not be detectable or even significant for several time steps after device installation, therefore a piecewise RUL target function is proposed (i.e., Heimes 2008; Babu et al., 2016; and Zheng et al., 2017). The current study assesses the utility of a piecewise RUL target function with a maximum RUL capped at 1095 days (i.e., 3 years), ending at 0 after a full 2190 day (i.e., 6 years) life-cycle (Figure 1). The HDD manufacturer’s warranty is five years; we add one additional year to the life-cycle to be conservative. The RUL cap of 1095 days was chosen as the midpoint in the lifecycle, other RUL caps were tested with mixed results.

Figure 1: The linear and piecewise Remaining Useful Life (RUL) target functions are displayed. The piecewise target function is shown in red, with the assumption that no HDD degradation is significant for 1095 days after device installation.

The current study assumes that the initial RUL will be provided as an input feature for prediction ― presumably the RUL at the time of device installation ― and therefore is included in the model building process. Each feature is scaled by its maximum value, so that every feature value is between 0 and 1. The importance of each feature is measured using univariate linear regression tests, features with zero significance are removed, all others are kept for modeling.

Experimental Results

After preprocessing, the dataset is split into train and test sets in order to provide the models one set of HDDs to train on, and another set of unseen HDDs to test on. The RF and DCNN models are trained to predict the change in a given HDD’s RUL from one day to the next based on the current day’s SMART attribute readings and yesterday’s (t -1) predicted RUL value output by the model. The RF is optimized using grid search cross validation over a variety of parameters that control model complexity. The DCNN is designed with three one-dimensional convolutional layers, two pooling layers, and one flattening layer for the regression output. The DCNN is trained over 10 epochs using an Adam optimizer with an initial learning rate set to 0.0008 and a learning rate decay set to 0.0005 per epoch (Figure 2).

Figure 2: The DCNN is trained over 10 epochs, using a validation split of the training dataset.

In order to evaluate and compare the models’ predictive performance, the models are tested on the test set of HDDs set aside prior to training. The models are seeded with the first day’s RUL (presumably at device installation), and then queried to predict the rest of the HDD’s RUL over its entire remaining life-cycle. The two models’ predictive performance are compared with one another (Table 1) on the test split using root mean square error (RMSE) and the coefficient of determination (R2) as regression metrics. According to the metrics, the DCNN was able to outperform the RF in most cases (Figures 3 to 5).

Figure 3: The daily change in RUL (i.e., delta) is predicted for HDD S30115YQ. The or actual RUL is shown in grey, the RF predictions are show in red, and the DCNN predictions are shown in blue.
Figure 4: The daily change in RUL (i.e., delta) is predicted for HDD S30113WT. The actual RUL is shown in grey, the RF predictions are show in red, and the DCNN predictions are shown in blue.
Figure 5: The daily change in RUL (i.e., delta) is predicted for HDD S30117FX. The actual RUL is shown in grey, the RF predictions are show in red, and the DCNN predictions are shown in blue.


Systems and/or components failure is a ubiquitous problem in numerous industries and may be addressed in several ways. The current study addresses the problem of predicting HDD failure by regressing the RUL on a multivariate array of SMART attribute readings. Ensemble Learning and DCNN approaches are compared with one another using operational data from the Backblaze data center. Experimentation shows that the DCNN was able to outperform the RF by using the current parameterization. Future studies may attempt alternative approaches to predict HDD failure, such as anomaly detection techniques or survival analysis.


  • Backblaze, “Hard Drive Data and Stats” (2019), https://www.backblaze.com/b2/hard-drive-test-data.html
  • G. S. Babu, P. Zhao, and X.-L. Li (2016) “Deep convolutional neural network based regression approach for estimation of remaining useful life,” in International Conference on Database Systems for Advanced Applications. Springer, pp. 214–228.
  • F. O. Heimes (2008) “Recurrent neural networks for remaining useful life estimation,” in Prognostics and Health Management. PHM 2008. International Conference on. IEEE, 2008, pp. 1–6.
  • S. Zheng, K. Ristovski, A. Farahat and C. Gupta (2017) ”Long Short-Term Memory Network for Remaining Useful Life estimation,” Proceedings of the IEEE ICPHM Conference, pp. 88–95.

The code for data processing and figure generation may be found here.