Azure Monitoring Dashboards
Overview
Azure Monitoring Dashboards are customizable, unified views within the Azure portal that consolidate metrics, logs, alerts, and health data from across your Azure resources. They give operations teams, developers, and stakeholders a single pane of glass to observe the health, performance, and availability of cloud workloads in real time.
Types of Dashboards
| Type |
Description |
Best For |
| Azure Portal Dashboard |
Drag-and-drop tiles pinned from resources |
Quick operational overviews |
| Azure Workbooks |
Rich, interactive reports with KQL queries |
Deep-dive analysis and sharing |
| Grafana (Azure Managed) |
Open-source dashboards backed by Azure data sources |
Advanced visualization, multi-cloud |
| Power BI |
Business-oriented dashboards connected via Azure Monitor |
Executive reporting |
Key Metrics to Monitor
Virtual Machines
| Metric |
Threshold to Watch |
| Percentage CPU |
> 80% sustained |
| Available Memory Bytes |
< 10% of total RAM |
| Disk Read/Write Bytes/sec |
Spikes indicating I/O bottlenecks |
| Network In/Out |
Unexpected traffic surges |
Azure Kubernetes Service (AKS)
| Metric |
Notes |
| Node CPU utilization |
Per node and cluster-wide |
| Pod restart count |
Indicates crashes or OOM kills |
| Pending pod count |
Scheduling failures |
| kube-apiserver latency |
Control plane health |
Azure SQL / Cosmos DB
| Metric |
Notes |
| DTU/vCore utilization |
Database compute saturation |
| Deadlocks |
Query contention |
| Request Units consumed |
Cosmos DB throughput |
| Connection failures |
Connectivity issues |
App Services / Functions
| Metric |
Notes |
| HTTP 5xx errors |
Server-side failures |
| Average response time |
Performance degradation |
| Requests per second |
Load trends |
| Function execution count |
Invocation volume |
Writing KQL Queries for Dashboards
KQL (Kusto Query Language) is used to query Log Analytics data for pinning to dashboards.
Example — Top 10 Errors in the Last 24 Hours
AppExceptions
| where TimeGenerated > ago(24h)
| summarize ErrorCount = count() by OuterType, AppRoleName
| top 10 by ErrorCount desc
| render barchart
Example — VM CPU Average by Hour
Perf
| where ObjectName == "Processor" and CounterName == "% Processor Time"
| summarize AvgCPU = avg(CounterValue) by bin(TimeGenerated, 1h), Computer
| render timechart
Example — Alert Trend Over 7 Days
AlertsManagementResources
| where type == "microsoft.alertsmanagement/alerts"
| where properties.essentials.startDateTime > ago(7d)
| summarize AlertCount = count() by Severity = tostring(properties.essentials.severity), bin(todatetime(properties.essentials.startDateTime), 1d)
| render columnchart
Best Practices
- Organize by audience — Create separate dashboards for Ops, Dev, and Executive views
- Use resource tags — Filter dashboards dynamically using subscription-level tag scopes
- Version control dashboards — Export JSON and store in Git for auditability
- Set alert tiles prominently — Always include an alert summary tile at the top
- Limit tile count — Keep dashboards focused; use Workbooks for detailed drill-downs
- Apply RBAC — Grant least-privilege access; use Reader role for view-only stakeholders
- Schedule review cadence — Revisit dashboard relevance monthly as architectures evolve
Useful Links