Paper ID: 2203.08067
Practical data monitoring in the internet-services domain
Nikhil Galagali
Large-scale monitoring, anomaly detection, and root cause analysis of metrics are essential requirements of the internet-services industry. To address the need to continuously monitor millions of metrics, many anomaly detection approaches are being used on a daily basis by large internet-based companies. However, in spite of the significant progress made to accurately and efficiently detect anomalies in metrics, the sheer scale of the number of metrics has meant there are still a large number of false alarms that need to be investigated. This paper presents a framework for reliable large-scale anomaly detection. It is significantly more accurate than existing approaches and allows for easy interpretation of models, thus enabling practical data monitoring in the internet-services domain.
Submitted: Mar 15, 2022