作者
Jian-Guang Lou, Qingwei Lin, Rui Ding, Qiang Fu, Dongmei Zhang, Tao Xie
发表日期
2013/11/11
研讨会论文
2013 28th IEEE/ACM International Conference on Automated Software Engineering (ASE)
页码范围
475-485
出版商
IEEE
简介
As online services become more and more popular, incident management has become a critical task that aims to minimize the service downtime and to ensure high quality of the provided services. In practice, incident management is conducted through analyzing a huge amount of monitoring data collected at runtime of a service. Such data-driven incident management faces several significant challenges such as the large data scale, complex problem space, and incomplete knowledge. To address these challenges, we carried out two-year software-analytics research where we designed a set of novel data-driven techniques and developed an industrial system called the Service Analysis Studio (SAS) targeting real scenarios in a large-scale online service of Microsoft. SAS has been deployed to worldwide product datacenters and widely used by on-call engineers for incident management. This paper shares our …
引用总数
2013201420152016201720182019202020212022202320241489862127695
学术搜索中的文章
JG Lou, Q Lin, R Ding, Q Fu, D Zhang, T Xie - 2013 28th IEEE/ACM International Conference on …, 2013