Configure Alerts Using AlertManager
To keep your clusters and applications healthy and drive productivity forward, you need to stay informed of all events occurring in your cluster. DKP helps you to stay informed of these events by using the alertmanager
of the kube-prometheus-stack
.
Kommander is configured with pre-defined alerts to monitor four specific events. You receive alerts related to:
State of your nodes
System services managing the Kubernetes cluster
Resource events from specific system services
Prometheus expressions exceeding some pre-defined thresholds
Some examples of the alerts currently available are:
CPUThrottlingHigh
TargetDown
KubeletNotReady
KubeAPIDown
CoreDNSDown
KubeVersionMismatch
A complete list with all the pre-defined alerts can be found on GitHub.
Prerequisites
Determine the name of the workspace where you wish to perform the actions. You can use the
dkp get workspaces
command to see the list of workspace names and their corresponding namespaces.Set the
WORKSPACE_NAMESPACE
environment variable to the name of the workspace’s namespace where the cluster is attached:CODEexport WORKSPACE_NAMESPACE=<workspace_namespace>
Configure Alert Rules
Use override ConfigMaps
to configure alert rules.
You can enable or disable the default alert rules by providing the desired configuration in an overrides ConfigMap. For example, if you want to disable the default node
alert rules, follow these steps to define an overrides ConfigMap:
Create a file named
kube-prometheus-stack-overrides.yaml
and paste the following YAML code into it to create the overrides ConfigMap:CODEapiVersion: v1 kind: ConfigMap metadata: name: kube-prometheus-stack-overrides namespace: ${WORKSPACE_NAMESPACE} data: values.yaml: | --- defaultRules: rules: node: false
Use the following command to apply the YAML file:
CODEkubectl apply -f kube-prometheus-stack-overrides.yaml
Edit the
kube-prometheus-stack
AppDeployment to replace thespec.configOverrides.name
value withkube-prometheus-stack-overrides
. (You can use the steps in the procedure, Deploy an application with a custom configuration as a guide.)CODEdkp edit appdeployment -n ${WORKSPACE_NAMESPACE} kube-prometheus-stack
After your editing is complete, the AppDeployment resembles this example:
CODEapiVersion: apps.kommander.d2iq.io/v1alpha2 kind: AppDeployment metadata: name: kube-prometheus-stack namespace: ${WORKSPACE_NAMESPACE} spec: appRef: name: kube-prometheus-stack-34.9.3 kind: ClusterApp configOverrides: name: kube-prometheus-stack-overrides
To disable all rules, create an overrides ConfigMap with this YAML code:
CODEapiVersion: v1 kind: ConfigMap metadata: name: kube-prometheus-stack-overrides namespace: ${WORKSPACE_NAMESPACE} data: values.yaml: | --- defaultRules: create: false
Alert rules for the Velero platform service are turned off by default. You can enable them with the following overrides ConfigMap. They should be enabled only if the
velero
platform service is enabled. If platform services are disabled disable the alert rules to avoid alert misfires.CODEapiVersion: v1 kind: ConfigMap metadata: name: kube-prometheus-stack-overrides namespace: ${WORKSPACE_NAMESPACE} data: values.yaml: | --- mesosphereResources: rules: velero: true
To create a custom alert rule named
my-rule-name
, create the overrides ConfigMap with this YAML code:CODEapiVersion: v1 kind: ConfigMap metadata: name: kube-prometheus-stack-overrides namespace: ${WORKSPACE_NAMESPACE} data: values.yaml: | --- additionalPrometheusRulesMap: my-rule-name: groups: - name: my_group rules: - record: my_record expr: 100 * my_record
After you set up your alerts, you can manage each alert using the Prometheus web console to mute or unmute firing alerts, and perform other operations. For more information about configuring alertmanager
, see the Prometheus website.
To access the Prometheus Alertmanager UI, browse to the landing page and then search for the Prometheus Alertmanager dashboard, for example https://<CLUSTER_URL>/dkp/alertmanager
.
Notify Prometheus Alerts in Slack
To hook up the Prometheus alertmanager
notification system, you need to overwrite the existing configuration.
The following file, named
alertmanager.yaml
, configuresalertmanager
to use the Incoming Webhooks feature of Slack (slack_api_url: https://hooks.slack.com/services/<HOOK_ID>
) to fire all the alerts to a specific channel#MY-SLACK-CHANNEL-NAME
.CODEglobal: resolve_timeout: 5m slack_api_url: https://hooks.slack.com/services/<HOOK_ID> route: group_by: ['alertname'] group_wait: 2m group_interval: 5m repeat_interval: 1h # If an alert isn't caught by a route, send it to slack. receiver: slack_general routes: - match: alertname: Watchdog receiver: "null" receivers: - name: "null" - name: slack_general slack_configs: - channel: '#MY-SLACK-CHANNEL-NAME' icon_url: https://avatars3.githubusercontent.com/u/3380462 send_resolved: true color: '{{ if eq .Status "firing" }}danger{{ else }}good{{ end }}' title: '{{ template "slack.default.title" . }}' title_link: '{{ template "slack.default.titlelink" . }}' pretext: '{{ template "slack.default.pretext" . }}' text: '{{ template "slack.default.text" . }}' fallback: '{{ template "slack.default.fallback" . }}' icon_emoji: '{{ template "slack.default.iconemoji" . }}' templates: - '*.tmpl'
The following file, named
notification.tmpl
, is a template that defines a pretty format for the fired notifications:CODE{{ define "__titlelink" }} {{ .ExternalURL }}/#/alerts?receiver={{ .Receiver }} {{ end }} {{ define "__title" }} [{{ .Status | toUpper }}{{ if eq .Status "firing" }}:{{ .Alerts.Firing | len }}{{ end }}] {{ .GroupLabels.SortedPairs.Values | join " " }} {{ end }} {{ define "__text" }} {{ range .Alerts }} {{ range .Labels.SortedPairs }}*{{ .Name }}*: `{{ .Value }}` {{ end }} {{ range .Annotations.SortedPairs }}*{{ .Name }}*: {{ .Value }} {{ end }} *source*: {{ .GeneratorURL }} {{ end }} {{ end }} {{ define "slack.default.title" }}{{ template "__title" . }}{{ end }} {{ define "slack.default.username" }}{{ template "__alertmanager" . }}{{ end }} {{ define "slack.default.fallback" }}{{ template "slack.default.title" . }} | {{ template "slack.default.titlelink" . }}{{ end }} {{ define "slack.default.pretext" }}{{ end }} {{ define "slack.default.titlelink" }}{{ template "__titlelink" . }}{{ end }} {{ define "slack.default.iconemoji" }}{{ end }} {{ define "slack.default.iconurl" }}{{ end }} {{ define "slack.default.text" }}{{ template "__text" . }}{{ end }}
Finally, apply these changes to
alertmanager
as follows. Set${WORKSPACE_NAMESPACE}
to the workspace namespace thatkube-prometheus-stack
is deployed in:CODEkubectl create secret generic -n ${WORKSPACE_NAMESPACE} \ alertmanager-kube-prometheus-stack-alertmanager \ --from-file=alertmanager.yaml \ --from-file=notification.tmpl \ --dry-run=client --save-config -o yaml | kubectl apply -f -
Notify Prometheus Alerts in E-Mail
To configure the Prometheus alertmanager
notification system to send an email for alerts, you need to overwrite the existing configuration. The steps below configure Alertmanager to send all configured alerts to a gmail account named test@gmail.com.
Create a file named
alertmanager.yaml
with the following contents:CODEglobal: resolve_timeout: 5m inhibit_rules: [] receivers: - name: "null" - name: test_gmail email_configs: - to: test@gmail.com from: test@gmail.com auth_username: test@gmail.com auth_password: password send_resolved: true require_tls: true smarthost: smtp.gmail.com:587 route: receiver: test_gmail group_by: - namespace group_interval: 5m group_wait: 30s repeat_interval: 12h routes: - matchers: - alertname =~ "InfoInhibitor|Watchdog" receiver: "null" templates: - /etc/alertmanager/config/*.tmpl
Apply these changes to
alertmanager
as follows. Set${WORKSPACE_NAMESPACE}
to the workspace namespace thatkube-prometheus-stack
is deployed in (typically thekommander
namespace):CODEkubectl create secret generic -n ${WORKSPACE_NAMESPACE} \ alertmanager-kube-prometheus-stack-alertmanager \ --from-file=alertmanager.yaml \ --dry-run=client --save-config -o yaml | kubectl apply -f -
Allow some time for the configuration to take affect. You can then use the following command to verify that the configuration took effect:
CODEkubectl exec -it alertmanager-kube-prometheus-stack-alertmanager-0 -n kommander -- cat /etc/alertmanager/config_out/alertmanager.env.yaml
For more information on configuring email alerting, refer to the Alertmanager documentation.