Authors
Qingwei Lin, Ken Hsieh, Yingnong Dang, Hongyu Zhang, Kaixin Sui, Yong Xu, Jian-Guang Lou, Chenggang Li, Youjiang Wu, Randolph Yao, Murali Chintalapati, Dongmei Zhang
Publication date
2018/10/26
Book
Proceedings of the 2018 26th ACM joint meeting on European software engineering conference and symposium on the foundations of software engineering
Pages
480-490
Description
In recent years, many traditional software systems have migrated to cloud computing platforms and are provided as online services. The service quality matters because system failures could seriously affect business and user experience. A cloud service system typically contains a large number of computing nodes. In reality, nodes may fail and affect service availability. In this paper, we propose a failure prediction technique, which can predict the failure-proneness of a node in a cloud service system based on historical data, before node failure actually happens. The ability to predict faulty nodes enables the allocation and migration of virtual machines to the healthy nodes, therefore improving service availability. Predicting node failure in cloud service systems is challenging, because a node failure could be caused by a variety of reasons and reflected by many temporal and spatial signals. Furthermore, the failure …
Total citations
2019202020212022202320246162528289
Scholar articles
Q Lin, K Hsieh, Y Dang, H Zhang, K Sui, Y Xu, JG Lou… - Proceedings of the 2018 26th ACM joint meeting on …, 2018