Rook Ceph Configuration
This page contains information about configuring Rook Ceph in your DKP Environment.
The Ceph instance installed by DKP is intended only for use by the logging stack and
velero platform applications.
If you have an instance of Ceph that is managed outside of the DKP lifecycle, see Bring Your Own Storage to DKP Clusters.
Components of a Rook Ceph Cluster
Ceph supports creating clusters in different modes as listed in CephCluster CRD - Rook Ceph Documentation. DKP, specifically is shipped with a PVC Cluster, as documented in PVC Storage Cluster - Rook Ceph Documentation. It is recommended to use the PVC mode to keep the deployment and upgrades simple and agnostic to technicalities with node draining.
Ceph cannot be your CSI Provisioner when installing in PVC mode as Ceph relies on an existing CSI provisioner to bind the PVCs created by it. It is possible to use Ceph as your CSI provisioner, but that is outside the scope of this document. If you have an instance of Ceph that acts as the CSI Provisoner, then it is possible to reuse it for your DKP Storage needs. See BYOS (Bring Your Own Storage) to DKP Clusters for information on reusing existing Ceph.
When you create
rook-ceph-cluster platform applications results in the deployment of various components as listed in the following diagram:
Items highlighted in green are user-facing and configurable.
Please refer to Rook Ceph Storage Architecture and Ceph Architecture for an in-depth explanation of the inner workings of the components outlined in the above diagram.
For additional details about the data model, refer to the Rook Ceph Data Model page.
The following is a non-exhaustive list of the resource requirements for long running components of Ceph:
100m x # of mgr instances (default 2)
250m x # of mon instances (default 3)
250m x # of osd instances (default 4)
100m x # of crashcollector instances (Daemonset i.e., # of nodes)
250m x # of rados gateway replicas (default 2)
512Mi x # of mgr instances (default 2)
512Gi x # of mon instances (default 3)
1Gi x # of osd instances (default 4)
500Mi x # of rados gateway replicas (default 2)
4 x 40Gi PVCs with
3 x 10Gi PVCs with
StorageClass should support creation of
PersistentVolumes that satisfy the
PersistentVolumeClaims created by Ceph with
Ceph Storage Configuration
Ceph is highly configurable and can support Replication or Erasure Coding to ensure data durability. DKP is configured to use Erasure Coding for maximum efficiency.
Replication and Erasure Coding are the two primary methods for storing data in a durable fashion in any distributed system.
For a replication factor of N, data has N copies (including the original copy)
Smallest possible replication factor is 2 (usually this means 2 storage nodes).
With replication factor of 2, data has 2 copies and this tolerates loss of one copy of data.
Storage efficiency :
(1/N) * 100percentage. For example,
N=2, then efficiency is
N=3, then efficiency is
Fault Tolerance :
N-1nodes can be lost without loss of data. For example,
N=2, then atmost 1 node can be lost without data loss.
N=3, then atmost 2 nodes can be lost without data loss and so on.
Slices an object into
kdata fragments and computes
mparity fragments. The erasure coding scheme gaurentees that data can be recreated using any
kfragments out of
k + m = nfragments are spread across (
>=n) Storage Nodes to offer durability.
nfragments (could be parity or could be data fragments) are needed for recreation of data, at most
mfragments can be lost without loss of data.
The smallest possible count is
k = 2,
m = 1i.e.,
n = k + m = 3. This works only if there are at least
n = 3storage nodes.
k/(k+m) * 100percentage. For example,
k=2, m=1, then efficiency is
k=3, m=1, then efficiency is
75%and so on.
mnodes can be lost without loss of data. For example:
k=3, m=1then atmost 1 out of 4 nodes can be lost without data loss.
k=4, m=2then atmost 2 out of 6 nodes can be lost without data loss and so on.
The default configuration creates a
CephCluster that creates 4 x
40G each, resulting in
160G of raw storage. Erasure coding ensures durability with
k=3 data bits and
m=1 parity bits. This gives a storage efficiency of
75% (refer to the primer above for calculation), which means
120G of disk space is available for consumption by services like
It is possible to override replication strategy for logging stack (
velero backups. Refer to the default configmap for the
CephObjectStore at services/rook-ceph-cluster/1.10.3/defaults/cm.yaml#L126-L175 and override the replication strategy according to your needs by referring to CephObjectStore CRD documentation.
Rook Ceph in DKP - Prerequisites