Generate a Support Bundle

Generate a Support Bundle

Follow these instructions to generate a support bundle with data collected for the last 48 hours of the life of the cluster.

Prerequisites

Before generating a support bundle, verify that you have:

  • An AMD64-based Linux or MacOS machine with a supported version of the operating system.
  • A running Kubernetes cluster.
  • dkp-diagnose command for MacOS or Linux for collecting the support bundle.

Download dkp-diagnose

  1. To download and extract the dkp-diagnose binary for MacOS or Linux

    For Linux:

    mkdir dkp-diagnose && curl -sL https://downloads.mesosphere.io/dkp/dkp-diagnose_v0.3.2_linux_amd64.tar.gz | tar -xz -C ./dkp-diagnose/
    

    For macOS:

    mkdir dkp-diagnose && curl -sL https://downloads.mesosphere.io/dkp/dkp-diagnose_v0.3.2_darwin_amd64.tar.gz | tar -xz -C ./dkp-diagnose/
    
  2. Add the binary to your PATH:

    export PATH=./dkp-diagnose/:$PATH
    
  3. Verify the binary works:

    dkp-diagnose version
    

Create a diagnostic bundle

dkp-diagnose was developed by D2IQ and builds on the open source troubleshoot.sh project.

NOTE: dkp-diagnose is based on version 0.13.16 of troubleshoot.sh with custom modifications. The D2IQ fork is open source and available from here.

dkp-diagnose supports multiple support bundle collectors and can be configured as a SupportBundle Kubernetes resource in a yaml file.

The following list is the minimum set of resources that is required to debug a cluster, but can be further customized.

The bundle uses the following collectors:

  • clusterInfo collects basic information about the cluster
  • clusterResources collects a subset of available resources in the cluster
  • configMap collects the values of Kubernetes ConfigMaps
  • secrets collects the values of Kubernetes ConfigMaps
  • execCopyFromHost runs a container on each node on the cluster and copies the created data
  • allLogs is capable of collecting logs from all containers on the cluster

Generate a Support Bundle

NOTE: dkp-diagnose uses the same Kubernetes configuration as kubectl. dkp-diagnose can also be pointed at a specific configuration by using the --kubeconfig parameter.

To generate the support bundle, perform the following steps:

  1. Run the dkp-diagnose command by running the default collectors configuration.

    dkp-diagnose
    
    Collecting support bundle ...
    
    support-bundle-2021-08-13T14_44_23.tar.gz
    
  2. To view the bundle contents, extract the bundle (replacing support-bundle-2021-08-13T14_44_23.tar.gz with the location from the previous step):

    tar -xzvf support-bundle-2021-08-13T14_44_23.tar.gz
    
  3. A new directory named support-bundle-<date-created>is created. This directory contains the files specified:

    ls support-bundle-2021-08-13T14_44_23
    
    cluster-info  cluster-resources  configmaps  node-diagnostics  pod-logs  secrets  version.yaml
    

Collect information from a bootstrap cluster

In the case where your bootstrap cluster has not yet pivoted towards your Konvoy cluster, you can collect log information from that bootstrap cluster as well. There are a preconfigured set of relevant collectors. Specify an additional bootstrap cluster kubeconfig using the --bootstrap-kubeconfig parameter to activate bootstrap cluster diagnostics. You will receive an additional support bundle named bootstrap-support-bundle-<date created>.

Note that the bootstrap cluster diagnostics are independent of the configuration of the “main” or Konvoy cluster diagnostics. We run a static collector set that collects the following bootstrap cluster information:

  • ClusterInfo
  • ClusterResources
  • AllLogs
  • ConfigMaps
  • Secrets
  1. Run the dkp-diagnose command with bootstrap bundle configuration.
    dkp-diagnose bundle.yaml

Customizations

To print the default collectors configuration, run the following command:

dkp-diagnose default-config > bundle.yaml

Edit the file to make appropriate modifications.

NOTE: dkp-diagnose by default does not require that you supply a configuration. The default bundle configuration can be printed by running dkp-diagnose default-config.

SSH fallback

In some cases the Kubernetes API is not available for the cluster. In those cases you can collect node level information using SSH access to the diagnosed nodes. Be aware that not all clusters have SSH access configured. If they do not then access using SSH fallback is not possible.

To get node level information from your cluster using SSH access, perform the following steps:

  1. Enter the following command:
dkp-diagnose ssh <path/to/ansible-inventory.yaml>

The ansible-inventory.yaml file specifies the nodes to access for data collection.

NOTE: This collector does not use the full ansible inventory.yaml format only a limited subset to describe the infrastructure.

Only the following attributes of the ansible-inventory.yaml are supported. All other group definitions are ignored.

  • Support for all shared variables.

  • Support for hosts key in all groups.

  • Supported behavorial inventory is limited to:

    • ansible_host
    • ansible_port
    • ansible_user
    • ansible_ssh_private_key_file

    The following is an example inventory.yaml file:

all:
  vars:
    ansible_user: centos
  hosts:
    host-1:
      ansible_host: 192.168.10.1
    host-2:
      ansible_host: 192.168.10.22
      ansible_port: 2222

More information on these ansible parameters can be found here.

NOTE: All other group definitions in the inventory.yaml file are ignored.

Refer to the following example file:

all:
  vars:
    ansible_user: centos
  hosts:
    host-1:
      ansible_host: 192.168.10.1
    host-2:
      ansible_host: 192.168.10.22
      ansible_port: 2222

The fallback collector runs a bash script over SSH and copies the collected data. The format of the created bundle matches that of dkp-diagnose collector generated bundles.

    node-diagnostics/<HOSTNAME_PORT>/data/
        - dmesg
        - ....

Redactors are supported and are in the same format as the main dkp-diagnose command. Per node collection timeouts are supported using the --timeout parameter.