Enterprise Monitoring Specific Guiding Principles
Monitoring Principle 1
Know before they know; Pre-empt service degradation and potential outages
Description: Identify system degradation and potential outages before impacted users through proactive breach notifications of pre-defined thresholds indicative of service problems.
Rationale: Degraded performance of applications and outages are disruptive to the University, costly if gone unnoticed, and contribute to a frustrating experience of users and stakeholders alike. Proactive monitoring of pre-defined metrics evaluated against baseline performance thresholds at each layer of application infrastructure, enables the monitoring system to automatically notify service owner(s), thereby identifying problems before system degradations or outages. The benefit of this principle is to provide service teams sufficient warning of problems so they can remediate issues prior to an end-user’s knowledge or significant degradation in service.
Implications: Service owners establish metrics and correlations at each layer of application infrastructure that are indicative of service degradations and threshold alerts are automated to recognize these event patterns and send alerts to the service team.