You have set up threshold alarm for CPU Utilization metric for a value greater than 80 percent. You get a notification email about this alarm.
Which of the following action will help you respond to this notification?
A typical at-risk threshold for the CpuUtilization metric is any value greater than 80 percent. A Compute instance breaching this threshold is at risk of becoming inoperable. Often the cause of this behavior is one or more applications consuming a high percentage of the CPU.
In this example, you decide to notify the operations team immediately, setting the severity of the alarm as ''Critical'' because repair is required to bring the instances back to optimal operational levels. You configure alarm notifications to the responsible team by both PagerDuty and email, requesting an investigation and appropriate fixes before the instances go into an inoperable state. You set repeat notifications every minute. When someone responds to the alarm notifications, you temporarily stop notifications using the best practice of suppressing the alarm . Once metrics return to optimal values, you remove the suppression
Suppress Alarms During Investigations
Once a team member responds to an alarm, suppress notifications during the effort to investigate or mitigate the issue. Temporarily stopping notifications helps to avoid distractions during the investigation and mitigation. Remove the suppression when the issue has been resolved.
This topic describes best practices for working with alarms .
https://docs.cloud.oracle.com/en-us/iaas/Content/Monitoring/Concepts/alarmsbestpractices.htm
Currently there are no comments in this discussion, be the first to comment!