DKP® version 2.3.1 was released on October 13, 2022.

You must be a registered user and logged on to the support portal to download this product. New customers must contact their sales representative or sales@d2iq.com before attempting to download or install DKP.

Release Summary

Welcome to D2iQ Kubernetes Platform (DKP) 2.3.1! This release provides fixes to reported issues, integrates changes from previous releases, and maintains compatibility and support for other packages used in DKP.

Fixes and Updates

The following updates and fixes are included in this release.

Kommander Install fails on Gatekeeper

Incident D2IQ-92981

When attempting to install Kommander on a cluster in FIPS mode, the install fails because the gatekeeper-update-namespace-label pod continuously crash loops. The following error message returns from the pod logs:

Error from server (InternalError): Internal error occurred: failed calling webhook "check-ignore-label.gatekeeper.sh": failed to call webhook: Post "https://gatekeeper-webhook-service.kommander.svc:443/v1/admitlabel?timeout=3s": remote error: tls: protocol version not supported
CODE

This error occurred due to gatekeeper attempting to use a TLS 1.3 connection, however TLS 1.3 is not supported in FIPS 140-2 mode. The gatekeeper deployment was changed to always use TLS 1.2, correcting the issue.

KIB 1.19.9 AWS + Airgapped + FIPS fails to Install rpm Packages

Incident D2IQ-92900

Attempting to create a FIPS compliant machine image for RHEL 8.x fails, producing the following error:

rhel-8.4: FAILED - RETRYING: install kubectl rpm package (1 retries left).
    rhel-8.4: fatal: [default]: 
FAILED! => {"attempts": 3, "changed": false, "failures": [], "msg": "Unknown Error occured: Transaction test error:\n  package kubectl-1.23.12-0.x86_64 does not verify: no digest\n", "rc": 1, "results": []}
rhel-8.4:rhel-8.4: PLAY RECAP *********************************************************************
rhel-8.4: default: ok=39   changed=22   unreachable=0    failed=1    skipped=41   rescued=0  ignored=1 rhel-8.4:
CODE

This error occurred due to air-gapped FIPS RPM package bundles not being signed with FIPS compatible package signatures. This issue is now resolved.

Download Signature Files

You need to download an appropriate, signed signature file before you run FIPS validation. Verify which version of DKP you are running to ensure you are downloading the manifest that is compliant with the DKP release number on your system. You can use the FIPS validation tool to verify that specific components and services are FIPS-compliant by checking the signatures of the files against a signed signature file, and by checking that services are using the certified algorithms. Select the links in the Manifest URL column of the following table to obtain a valid file: 

DKP version 2.3.1

Operating System version

Kubernetes version

containerd version

Manifest URL

CentOS 7.9

v1.23.12

1.14.13

v1.23.12 CentOS 7.9 Manifest

Oracle 7.9

v1.23.12

1.14.13

v1.23.12 OL 7.9 Manifest

RHEL 7.9

v1.23.12

1.14.13

v1.23.12 RHEL 7.9 Manifest

RHEL 8.2

v1.23.12

1.14.13

v1.23.12 RHEL 8.2 Manifest

RHEL 8.4

v1.23.12

1.14.13

v1.23.12 RHEL 8.4 Manifest

Supported Versions

Any DKP cluster you attach using DKP 2.3.1 must be running a Kubernetes version in the following ranges:

Kubernetes Support

Version

DKP Minimum

1.22.0

DKP Maximum

1.23.x

DKP Default

1.23.12

EKS Default

1.22.x

AKS Default

1.23.x

GKE Default

1.22.x-1.23.x

DKP 2.3 comes with support for Kubernetes 1.23, enabling you to benefit from the latest features and security fixes in upstream Kubernetes. This release comes with approximately 47 enhancements. To read more about major features in this release, visit https://kubernetes.io/blog/2021/12/07/kubernetes-1-23-release-announcement/.

2.3.1 components and applications

The following are component and application versions for DKP 2.3.1.

Components

Component Name

Version

Cluster API Core (CAPI)

1.1.3-d2iq.5

Cluster API AWS Infrastructure Provider (CAPA)

1.4.1

Cluster API Google Cloud Infrastructure Provider (CAPG)

1.1.0

Cluster API Pre-provisioned Infrastructure Provider (CAPPP)

0.9.4

Cluster API vSphere Infrastructure Provider (CAPV)

1.2.0

Cluster API Azure Infrastructure Provider (CAPZ)

1.3.2

Konvoy Image Builder

1.19.11

containerd

1.4.13

etcd

3.4.13

Applications

Common Application Name

APP ID

Version

Component Versions

Centralized Grafana

centralized-grafana

34.9.3

  • chart: 34.9.3

  • prometheus-operator: 0.55.0

Centralized Kubecost

centralized-kubecost

0.26.0

  • chart: 0.26.0

  • kubecost: 1.95.0

Cert Manager

cert-manager

1.7.1

  • chart: 1.7.1

  • cert-manager: 1.7.1

Chartmuseum

chartmuseum

3.9.0

  • chart: 3.9.0

  • chartmuseum: 3.9.0

Dex

dex

2.9.18

  • chart: 2.9.18

  • dex: 2.31.0

Dex K8s Authenticator

dex-k8s-authenticator

1.2.13

  • chart: 1.2.13

  • dex-k8s-authenticator: 1.2.4

DKP Insights Management

dkp-insights-management

0.2.2

  • chart: 0.2.2

  • dkp-insights-management: 0.2.2

External DNS

external-dns

6.5.5

  • chart: 6.5.5

  • external-dns: 0.12.0

Fluent Bit

fluent-bit

0.19.21

  • chart: 0.19.20

  • fluent-bit: 1.9.3

Gatekeeper

gatekeeper

3.8.2

  • chart: 3.8.1

  • gatekeeper: 3.8.1

Gitea

gitea

5.0.9

  • chart: 5.0.9

  • gitea: 1.16.8

Grafana Logging

grafana-logging

6.28.0

  • chart: 6.28.0

  • grafana: 8.5.0

Grafana Loki

grafana-loki

0.48.4

  • chart: 0.48.4

  • loki: 2.5.0

Istio

istio

1.14.1

  • chart: 1.14.1

  • istio: 1.14.1

Jaeger

jaeger

2.32.2

  • chart: 2.32.2

  • jaeger: 1.34.1

Karma

karma

2.0.1

  • chart: 2.0.1

  • karma: 0.70

Kiali

kiali

1.52.0

  • chart: 1.52.0

  • kiali: 1.52.0

Knative

knative

0.4.0

  • chart: 0.4.0

  • knative: 0.22.3

Kube OIDC Proxy

kube-oidc-proxy

0.3.1

  • chart: 0.3.1

  • kube-oidc-proxy: 0.3.0

Kube Prometheus Stack

kube-prometheus-stack

34.9.3

  • chart: 34.9.3

  • prometheus-operator: 0.55.0

  • prometheus: 2.34.0

  • prometheus-alertmanager: 0.24.0

  • grafana: 8.5.0

Kubecost

kubecost

0.26.0

  • chart: 0.26.0

  • kubecost: 1.95.0

Kubefed

kubefed

0.9.2

  • chart: 0.9.2

  • kubefed: 0.9.2

Kubernetes Dashboard

kubernetes-dashboard

5.1.1

  • chart: 5.1.1

  • kubernetes-dashboard: 2.4.0

Kubetunnel

kubetunnel

0.0.13

  • chart: 0.0.13

  • kubetunnel: 0.0.13

Logging Operator

logging-operator

3.17.7

  • chart: 3.17.7

  • logging-operator: 3.17.7

MinIO Operator

minio-operator

4.4.25

  • chart: 4.4.25

  • minio-operator: 4.4.25

NFS Server Provisioner

nfs-server-provisioner

0.6.0

  • chart: 0.6.0

  • nfs-server-provisioner: 2.3.0

Nvidia

nvidia

0.4.4

  • chart: 0.4.4

  • nvidia-device-plugin: 0.1.4

Grafana (project)

project-grafana-logging

6.28.0

  • chart: 6.28.0

  • grafana: 8.5.0

Grafana Loki (project)

project-grafana-loki

0.48.4

  • chart: 0.48.4

  • loki: 2.5.0

Prometheus Adapter

prometheus-adapter

2.17.1

  • chart: 2.17.1

  • prometheus-adapter: 0.9.1

Reloader

reloader

0.0.110

  • chart: 0.0.110

  • reloader: 0.0.110

Thanos

thanos

0.4.6

  • chart: 0.4.6

  • thanos: 0.17.1

Traefik

traefik

10.9.1

  • chart: 10.9.1

  • traefik: 2.5.6

Traefik ForwardAuth

traefik-forward-auth

0.3.8

  • chart: 0.3.8

  • traefik-forward-auth: 3.1.0

Velero

velero

3.2.3

  • chart: 3.2.3

  • velero: 1.5.2

Known issues and limitations

The following items are known issues with this release.

Use static credentials to provision an Azure cluster

Only static credentials can be used when provisioning an Azure cluster.

When attaching GKE clusters, create a ResourceQuota to enable log collection

After you attach the GKE cluster, you can choose to deploy a stack of applications for workspace or project log collection. Once you have enabled this stack, create a ResourceQuota which is required for the logging stack to function correctly. You will have to do this manually, because some DKP versions do not properly handle this by default.
Create the following resource to enable log collection:

  1. Execute the following command to get the namespace of your workspace on the management cluster:

    kubectl get workspaces
    CODE

    And copy the value under WORKSPACE NAMESPACE column for your workspace. This may NOT be identical to the Display Name of the Workspace.

  2. Set the WORKSPACE_NAMESPACE environment variable to the name of the workspace’s namespace:

    export WORKSPACE_NAMESPACE=<gkeattached-cluster-namespace>
    CODE
  3. Run the following command on your attached GKE cluster to create the resource:

    cat << EOF | kubectl apply -f -
    apiVersion: v1
    kind: ResourceQuota
    metadata:
      name: fluent-bit-critical-pods
      namespace: ${WORKSPACE_NAMESPACE}
    spec:
      hard:
        pods: "1G"
      scopeSelector:
        matchExpressions:
        - operator: In
          scopeName: PriorityClass
          values:
          - system-node-critical
    EOF
    CODE

After a few minutes, log collection is available in your GKE cluster.

This workflow only creates a ResourceQuota in the targeted workspace. Repeat these steps if you want to deploy the logging stack to additional workspaces with GKE clusters.

Resolve issues with failed HelmReleases

There is an existing issue with the Flux helm-controller that can cause HelmReleases to get "stuck" with an error message such as Helm upgrade failed: another operation (install/upgrade/rollback) is in progress. This can happen when the helm-controller is restarted while a HelmRelease is upgrading, installing, and so on.

Workaround

To ensure the HelmRelease error was caused by the helm-controller restarting, first try to suspend/resume the HelmRelease:

kubectl -n <namespace> patch helmrelease <HELMRELEASE_NAME> --type='json' -p='[{"op": "replace", "path": "/spec/suspend", "value": true}]'
kubectl -n <namespace> patch helmrelease <HELMRELEASE_NAME> --type='json' -p='[{"op": "replace", "path": "/spec/suspend", "value": false}]'
CODE

This might resolve the issue. If not, continue with the following steps:   

You should see the HelmRelease attempting to reconcile, and then it either succeeds (with status: 'Release reconciliation succeeded') or it fails with the same error as before. 

If the HelmRelease is still in the failed state, it is likely related to the helm-controller restarting. For example, if the 'reloader' HelmRelease is the one that is stuck.

To resolve the issue, follow these steps:

  1. List secrets containing the affected HelmRelease name:

    kubectl get secrets -n ${NAMESPACE} | grep reloader
    CODE

    kommander-reloader-reloader-token-9qd8b                        kubernetes.io/service-account-token   3      171m
    sh.helm.release.v1.kommander-reloader.v1                       helm.sh/release.v1                    1      171m
    sh.helm.release.v1.kommander-reloader.v2                       helm.sh/release.v1                    1      117m           
    CODE

    In this example, sh.helm.release.v1.kommander-reloader.v2 is the most recent revision.

  2. Find and delete the most recent revision secret. For example sh.helm.release.v1.*.<revision>

    kubectl delete secret -n <namespace> <most recent helm revision secret name>
    CODE
  3. Suspend and resume the HelmRelease to trigger a reconciliation:

    kubectl -n <namespace> patch helmrelease <HELMRELEASE_NAME> --type='json' -p='[{"op": "replace", "path": "/spec/suspend", "value": true}]'
    kubectl -n <namespace> patch helmrelease <HELMRELEASE_NAME> --type='json' -p='[{"op": "replace", "path": "/spec/suspend", "value": false}]'
    CODE

You should see the HelmRelease is reconciled and eventually the upgrade and install succeeds.

Fluentbit disabled by default for DKP 2.3

Fluentbit is disabled by default in DKP 2.3 due to memory constraints. The amount of admin logs ingested to Loki requires additional disk space to be configured on the grafana-loki-minio Minio Tenant. Enabling admin logs may use around 2GB/day per node. See Configuring-the-Grafana-Loki-Minio-Tenant for more details on how to configure the Minio Tenant.

If Fluentbit is enabled on the management cluster and you would like it to continue to be deployed after the upgrade, you must pass in the --disable-appdeployments {} flag to the dkp upgrade kommander command. Otherwise, Fluentbit is automatically disabled upon upgrade.

Configure the Grafana Loki MinIO Tenant

Additional steps are required to change the default configuration of the MinIO Tenant that is deployed with Grafana Loki, grafana-loki-minio. Using config overrides is not supported.

By default, the grafana-loki-minio MinIO Tenant is configured with 2 pools with 4 servers each, 1 volume per server, for a total of 80GB.

The MinIO usable storage capacity is always less than the actual storage amount.

Use MinIO Erasure code calculator to establish the appropriate configuration for your log storage requirement.

  • You are only able to expand MinIO storage by adding more MinIO server pools with the correct configuration. Modifying existing server pools does not work as MinIO does not support reducing storage capacity. See this MinIO Operator documentation for details.

  • This impacts all your AppDeployment objects that reference the grafana-loki Kommander application definition.

  • The changes introduced by the following procedure are wiped out upon Kommander install and upgrade.

In this example, we modify the grafana-loki-minio MinIO Tenant object in kommander-workspace (namespace: kommander)

  1. Use this script to clone the management git repository from the Management cluster:

    export KUBECONFIG=$KUBECONFIG
    
    PASS=$(kubectl get secrets -nkommander admin-git-credentials -oyaml -o go-template="{{.data.password | base64decode }}")
    URL=https://gitea_admin:$PASS@$(kubectl -n kommander get ingress gitea -o jsonpath='{.status.loadBalancer.ingress[0].hostname}'):443/dkp/kommander/git/kommander/kommander
    
    git clone -c http.sslVerify=false $URL repo
    CODE
  2. Modify repo/services/grafana-loki/0.48.4/minio.yaml by appending a new server pool to .spec.pools field, for example:

    # the following will add a new server pool with 4 servers
    # each server is attached with 1 PersistentVolume of 50G
    - servers: 4
      volumesPerServer: 1
      volumeClaimTemplate:
        metadata:
          name: grafana-loki-minio
        spec:
          accessModes:
            - ReadWriteOnce
          resources:
            requests:
              storage: 50Gi
      resources:
        limits:
          cpu: 750m
          memory: 1Gi
        requests:
          cpu: 250m
          memory: 768Mi
      securityContext:
        runAsUser: 0
        runAsGroup: 0
        runAsNonRoot: false
        fsGroup: 0
    CODE
  3. Commit the changes to local clone of the git management repository when you are done editing:

    git add services/grafana-loki/0.48.4/minio.yaml
    git commit # finish the commit message editing in editor
    CODE
  4. Ensure that it is safe to apply the change, and then push the change to management git repository:

    git push origin main
    CODE
  5. Set your WORKSPACE_NAMESPACE env variable:

    # this is an example for kommander-workspace
    export WORKSPACE_NAMESPACE=kommander
    CODE
  6. Verify that the Tenant is modified as expected, when the grafana-loki kustomizations reconcile:

    # this prints the .status field of the tenant
    kubectl get tenants -n kommander grafana-loki-minio -o jsonpath='{ .status }' | jq
    CODE
  7. Verify that the new StatefulSet is READY:

    kubectl get sts -n $WORKSPACE_NAMESPACE -l v1.min.io/tenant=grafana-loki-minio
    
    NAME                      READY   AGE
    grafana-loki-minio-ss-0   4/4     144m
    grafana-loki-minio-ss-1   4/4     144m
    grafana-loki-minio-ss-2   4/4     15m
    CODE
  8. Restart all the StatefulSets that back this Tenant:

    kubectl -n $WORKSPACE_NAMESPACE rollout restart sts grafana-loki-minio-ss-0
    statefulset.apps/grafana-loki-minio-ss-0 restarted
    kubectl -n $WORKSPACE_NAMESPACE rollout restart sts grafana-loki-minio-ss-1
    statefulset.apps/grafana-loki-minio-ss-1 restarted
    kubectl -n $WORKSPACE_NAMESPACE rollout restart sts grafana-loki-minio-ss-2
    statefulset.apps/grafana-loki-minio-ss-2 restarted
    CODE
  9. Verify that the MinIO Pods that back this Tenant are all online:

    kubectl logs -n $WORKSPACE_NAMESPACE -l v1.min.io/tenant=grafana-loki-minio
    ...
    Verifying if 1 bucket is consistent across drives...
    Automatically configured API requests per node based on available memory on the system: 424
    All MinIO sub-systems initialized successfully
    Waiting for all MinIO IAM sub-system to be initialized.. lock acquired
    Status:         12 Online, 0 Offline. 
    API: http://minio.kommander.svc.cluster.local 
    
    Console: http://192.168.202.223:9090 http://127.0.0.1:9090   
    
    Documentation: https://docs.min.io
    ...
    CODE

FIPS upgrade from 2.2.x to 2.3.0

If upgrading a FIPS cluster, there is a bug in the upgrade of kube-proxy DaemonSet in that it doesn't get automatically upgraded. After completing the cluster upgrade, run the following command to finish upgrading the kube-proxy DaemonSet:

kubectl set image -n kube-system daemonset.v1.apps/kube-proxy kube-proxy=docker.io/mesosphere/kube-proxy:v1.23.12_fips.0
CODE

Additional resources