Get notified about service health incidents that have occurred in Azure

Wouldn't be nice to receive notifications about any service health incidents that have occurred in Azure? Automated notifications about platform issues could save your troubleshooting time.

The status of the Azure service is publicly available here but this site basically offers only a traffic light view without detailed information about the issue. Detailed subscription-level service health notifications can be verified or subscribed from Azure Service Health. These notifications are a sub-class of activity log events, and can also be found in the activity log. Service health notifications can be informational or actionable, depending on the class.

Notifications are determined forto the following classes

Action required Azure might notice something unusual happen on your account, and work with you to remedy this. Azure sends you a notification, either detailing the actions you need to take or how to contact Azure engineering or support.
Incident An event that impacts service is currently affecting one or more of the resources in your subscription.
Maintenance A planned maintenance activity that might impact one or more of the resources under your subscription.
Information Potential optimizations that might help improve your resource use.
Security Urgent security-related information regarding your solutions that run on Azure.

Source

This blog post covers how to subscribe to service health alerts and route them to the Teams channel.

How to subscribe to Azure Service Health notifications?

You can find Service Health from the Azure portal by search. Service Health gives you an overview of service issues and planned maintenance actions. You can configure service health notification delivery by clicking "Add service health alert". In the Add service health alert page, you can determine the event types (service issue, planned maintenance, health advisories, security advisory) you're interested in and what delivery method (email/SMS/webhook) you want to use. In this sample, we use the Webhook endpoint provided by Logic App.

Note! You can also configure a Health alert in the resource level by selecting Resource Health under Support + troubleshooting.

How to create a webhook endpoint provided by Logic App?

Logic App's role is to receive a service health alert and send it to the MS Teams channel. This sample uses Adaptive Cards to visualize notification messages in Teams. Azure also sends a new notification event when the issue is resolved. We don't want to flood the Teams notification channel by generating a new message every time even though only the status is changed. This sample shows how to reply status update event to the original Teams message thread.

The final result in Teams channel looks like this

What are Adaptive cards in Teams?

Adaptive Cards are actionable snippets of content that you can add to a conversation through a bot or messaging extension. Using text, graphics, and buttons, these cards provide rich communication to your audience. Source.

AdaptiveCards.io provides a good set of documentation about the topic. The adaptive card schema which is used in this sample is developed by a AdaptiveCards.io designer tool.

Logic App implementation steps

Logic App flow looks like this. Below more details about each step.

Step 1: When an HTTP request is received

This action creates an HTTP POST endpoint which receives service health notification. 

Get an example notification payload message from here to generate schema.

Step 2: Get Team messages

This action retrieves all messages from a specific Teams channel. Message information is used later to identify existing messages where the status update will be added as a reply message.

Step 3: Filter messages with attachments

Adaptive card content contains all information about the service health notification (ID, Service, Region, Communication, etc.). This content is included in the attachment object in the Teams message model. This action filters out messages that don't have attachments.

Expression used:

@greater(length(item()?['attachments']), 0)

Step 4: Filter messages with trackingId

This action filters Teams messages with tracking ID which is provided in the notification. Tracking ID is unique for the incident. 

Expression used:

item()?['attachments'][0]['content']

Step 5: Condition

Condition action is used to decide whether to create a new Teams message or add a status update to the existing message as a reply.

Expression used:

length(body('Filter_messages_with_trackingId'))

Step 6: Condition - True-path

True-path handles cases where notification status is updated to the original Team message thread.

Expression used:

body('Filter_messages_with_trackingId')[0]

Parse JSON action is used to enable strong typing of Teams message.

Enabling also strong typing for Status information.

Update status as a reply to the existing thread by Message ID. 

Step 6: Condition - False-path

False-path adds a new notification message to the Team channel. Adaptive Card schema JSON is copy-pasted here and input values are modified to the schema.

Now the solution is ready and your team is notified when something is happening in the Azure platform!

Sample data schemas used in the sample

Example service health notification

This example notification payload message is from Microsoft documentation.

{
  "channels": "Admin",
  "correlationId": "c550176b-8f52-4380-bdc5-36c1b59d3a44",
  "description": "Active: Network Infrastructure - UK South",
  "eventDataId": "c5bc4514-6642-2be3-453e-c6a67841b073",
  "eventName": {
      "value": null
  },
  "category": {
      "value": "ServiceHealth",
      "localizedValue": "Service Health"
  },
  "eventTimestamp": "2017-07-20T23:30:14.8022297Z",
  "id": "/subscriptions/<subscription ID>/events/c5bc4514-6642-2be3-453e-c6a67841b073/ticks/636361902148022297",
  "level": "Warning",
  "operationName": {
      "value": "Microsoft.ServiceHealth/incident/action",
      "localizedValue": "Microsoft.ServiceHealth/incident/action"
  },
  "resourceProviderName": {
      "value": null
  },
  "resourceType": {
      "value": null,
      "localizedValue": ""
  },
  "resourceId": "/subscriptions/<subscription ID>",
  "status": {
      "value": "Resolved",
      "localizedValue": "Resolved"
  },
  "subStatus": {
      "value": null
  },
  "submissionTimestamp": "2017-07-20T23:30:34.7431946Z",
  "subscriptionId": "<subscription ID>",
  "properties": {
    "title": "Network Infrastructure - UK South",
    "service": "Service Fabric",
    "region": "UK South",
    "communication": "Starting at approximately 21:41 UTC on 20 Jul 2017, a subset of customers in UK South may experience degraded performance, connectivity drops or timeouts when accessing their Azure resources hosted in this region. Engineers are investigating underlying Network Infrastructure issues in this region. Impacted services may include, but are not limited to App Services, Automation, Service Bus, Log Analytics, Key Vault, SQL Database, Service Fabric, Event Hubs, Stream Analytics, Azure Data Movement, API Management, and Azure Cognitive Search. Multiple engineering teams are engaged in multiple workflows to mitigate the impact. The next update will be provided in 60 minutes, or as events warrant.",
    "incidentType": "Incident",
    "trackingId": "1NA0F-BJGY2",
    "impactStartTime": "2017-07-20T21:41:00.0000000Z",
    "impactedServices": "[{\"ImpactedRegions\":[{\"RegionName\":\"UK South\"}],\"ServiceName\":\"Service Fabric\"}]",
    "defaultLanguageTitle": "Network Infrastructure - UK South",
    "defaultLanguageContent": "Starting at approximately 21:41 UTC on 20 Jul 2017, a subset of customers in UK South may experience degraded performance, connectivity drops or timeouts when accessing their Azure resources hosted in this region. Engineers are investigating underlying Network Infrastructure issues in this region. Impacted services may include, but are not limited to App Services, Automation, Service Bus, Log Analytics, Key Vault, SQL Database, Service Fabric, Event Hubs, Stream Analytics, Azure Data Movement, API Management, and Azure Cognitive Search. Multiple engineering teams are engaged in multiple workflows to mitigate the impact. The next update will be provided in 60 minutes, or as events warrant.",
    "stage": "Active",
    "communicationId": "636361902146035247",
    "version": "0.1.1"
  }
}

Example Teams message

This is an example Teams message which is retrieved via Logic App using Teams connector. Note! Adaptive card content is included in attachments (content-field).

[
  {
    "id": "123456789",
    "replyToId": null,
    "etag": "123456789",
    "messageType": "message",
    "createdDateTime": "2021-10-18T08:04:11.541Z",
    "lastModifiedDateTime": "2021-10-18T08:04:11.541Z",
    "lastEditedDateTime": null,
    "deletedDateTime": null,
    "subject": null,
    "summary": null,
    "chatId": null,
    "importance": "normal",
    "locale": "en-us",
    "webUrl": "",
    "policyViolation": null,
    "eventDetail": null,
    "from": {
      "device": null,
      "user": null,
      "application": {
        "id": "00000000-0000-0000-0000-000000000000",
        "displayName": "Flow",
        "applicationIdentityType": "bot"
      }
    },
    "body": {
      "contentType": "html",
      "content": "<attachment id=\"123456789\"></attachment>"
    },
    "channelIdentity": {
      "teamId": "00000000-0000-0000-0000-000000000000",
      "channelId": ""
    },
    "attachments": [
      {
        "id": "123456789",
        "contentType": "application/vnd.microsoft.card.adaptive",
        "contentUrl": null,
        "content": "[ADAPTIVE CARD CONTENT]",
        "name": null,
        "thumbnailUrl": null
      }
    ],
    "mentions": [],
    "reactions": []
  }
]

Example Adaptive Card content

{
    "type": "AdaptiveCard",
    "body": [
        {
            "items": [
                {
                    "columns": [
                        {
                            "width": "stretch",
                            "items": [
                                {
                                    "size": "large",
                                    "text": "**Service Health Alert**",
                                    "weight": "bolder",
                                    "type": "TextBlock"
                                }
                            ],
                            "type": "Column"
                        },
                        {
                            "width": "stretch",
                            "items": [
                                {
                                    "color": "attention",
                                    "horizontalAlignment": "right",
                                    "size": "large",
                                    "text": "Warning",
                                    "weight": "bolder",
                                    "wrap": true,
                                    "spacing": "none",
                                    "type": "TextBlock"
                                }
                            ],
                            "type": "Column"
                        }
                    ],
                    "type": "ColumnSet"
                }
            ],
            "style": "emphasis",
            "bleed": true,
            "type": "Container"
        },
        {
            "columns": [
                {
                    "width": "110px",
                    "items": [
                        {
                            "text": "Id",
                            "weight": "bolder",
                            "wrap": true,
                            "type": "TextBlock"
                        },
                        {
                            "text": "Title",
                            "weight": "bolder",
                            "wrap": true,
                            "type": "TextBlock"
                        },
                        {
                            "text": "Service",
                            "weight": "bolder",
                            "wrap": true,
                            "spacing": "small",
                            "type": "TextBlock"
                        },
                        {
                            "text": "Region",
                            "weight": "bolder",
                            "wrap": true,
                            "spacing": "small",
                            "type": "TextBlock"
                        },
                        {
                            "text": "Communication",
                            "weight": "bolder",
                            "wrap": true,
                            "spacing": "small",
                            "type": "TextBlock"
                        }
                    ],
                    "type": "Column"
                },
                {
                    "width": "auto",
                    "items": [
                        {
                            "text": "1NA0F-BJGY2",
                            "wrap": true,
                            "type": "TextBlock"
                        },
                        {
                            "text": "Network Infrastructure - UK South",
                            "wrap": true,
                            "type": "TextBlock"
                        },
                        {
                            "text": "Service Fabric",
                            "wrap": true,
                            "spacing": "small",
                            "type": "TextBlock"
                        },
                        {
                            "text": "UK South",
                            "wrap": true,
                            "spacing": "small",
                            "type": "TextBlock"
                        },
                        {
                            "text": "Starting at approximately 21:41 UTC on 20 Jul 2017, a subset of customers in UK South may experience degraded performance, connectivity drops or timeouts when accessing their Azure resources hosted in this region. Engineers are investigating underlying Network Infrastructure issues in this region. Impacted services may include, but are not limited to App Services, Automation, Service Bus, Log Analytics, Key Vault, SQL Database, Service Fabric, Event Hubs, Stream Analytics, Azure Data Movement, API Management, and Azure Cognitive Search. Multiple engineering teams are engaged in multiple workflows to mitigate the impact. The next update will be provided in 60 minutes, or as events warrant.",
                            "wrap": true,
                            "spacing": "small",
                            "type": "TextBlock"
                        }
                    ],
                    "type": "Column"
                }
            ],
            "spacing": "medium",
            "separator": true,
            "type": "ColumnSet"
        }
    ],
    "version": "1.2"
}

Comments