VM Resources Alert & Dashboard System

Designed and developed a centralized VM resource monitoring and alerting system to improve infrastructure visibility, incident response, and operational efficiency. The system integrates Prometheus and Grafana for metrics collection and visualization, with n8n as the automation engine to process alerts, manage workflows, and handle escalation logic. Implemented customized alerting pipelines where alerts are delivered to Telegram via webhooks, allowing engineers (PIC) to acknowledge incidents, add handling notes directly from Telegram, and automatically store all incident data in a MongoDB database. Alerts are deduplicated and tracked using unique alert identifiers to preserve incident history and prevent loss of handler information when recurring alerts occur. Developed a web-based alert dashboard connected to the MongoDB backend, providing detailed insights such as alert status classification (handled/unhandled, firing/resolved), PIC performance scoring, alert statistics per VM instance, date-based filtering, search functionality, and interactive UI enhancements including loading animations. Added support for monthly report generation with print-ready formatting for management and operational reviews. The system was deployed on on-premise servers and partially integrated with cloud environments (AWS EC2). Additional enhancements included storage and resource-specific alerts, role-based access control (admin/editor/viewer), alert escalation flows, and ongoing optimization for scalability and maintainability.