The DC/OS Couchbase service provides a robust API (accessible by HTTP or DC/OS CLI) for managing, repairing, and monitoring the service. Here, only the CLI version is presented for conciseness, but see the API Reference for HTTP instructions.
An instance of the DC/OS Couchbase service is ultimately a set of plans (how to run the service) and a set of pods (what to run). The service is driven primarily by two plans,
deploy plan is used during initial installation, as well as during configuration and service updates. The
recovery plan is an always-running plan that ensures that the pods and tasks that should be running, stay running. Other plans (“sidecars”) are used for secondary operations.
Every plan is made up of a set of phases, each of which consists of one or more steps.
A list of plans can be retrieved using the CLI command:
dcos couchbase --name=couchbase plan list
Inspecting a plan
View the current status of a plan using the CLI command:
dcos couchbase --name=couchbase plan status <plan-name>
For example, the status of the completed deploy plan of the service will be:
dcos couchbase --name=couchbase plan status deploy
Operating on a plan
Start a plan with the CLI command:
dcos couchbase --name=couchbase plan start <plan-name>
Pauses the plan, or a specific phase in that plan with the provided phase name (or UUID).
dcos couchbase --name=couchbase plan pause <plan (required)> <phase (optional)>
Stops the running plan with the provided name.
dcos couchbase --name=couchbase plan stop <plan>
Plan Stop differs from Plan Pause in the following ways:
- Pause can be issued for a specific phase or for all phases within a plan. Stop can only be issued for a plan.
- Pause updates the underlying Phase/Step state. Stop both ceases execution and of the plan and resets the plan to its initial pending state.
Resumes the plan, or a specific phase in that plan, with the provided phase name (or UUID).
dcos couchbase --name=couchbase plan resume <plan (required)> <phase (optional)>
Restarts the specified step, phase if no step is specified, or plan if no phase is specified.
dcos couchbase --name=couchbase force-restart <plan (required)> <phase (optional)> <step (optional)>
Force completes a specific step in the provided phase of the plan. From the CLI it is only possible to force complete a step. The HTTP API supports force completing on phases and plans in their entirety.
dcos couchbase --name=couchbase force-complete <plan (required)> <phase (required)> <step (required)>
A deployed instance of the DC/OS Couchbase service is made up of a set of running pods. Using the pod API, it is possible to manage the lifecycle of these pods as well as investigate failures of the pods and their tasks.
To list all the pods of the service run the CLI command:
dcos couchbase --name=couchbase pod list
To view the status of all pods or optionally just one pod run the CLI command:
dcos couchbase --name=couchbase pod status <pod-name (optional)>
This will show any status overrides for pods and their tasks.
To restart a pod in place, use the CLI command:
dcos couchbase --name=couchbase pod restart <pod-name>
This will kill the tasks of the pod, and relaunch them in-place. The progress of the restart can be monitored by watching the
recovery plan of the service.
Replace should be used only when the current instance of the pod should be completely destroyed. All persistent data (read: volumes) of the pod will be destroyed. Replace should be used when a DC/OS agent is being removed, is permanently down, or pod placement constraints need to be updated.
Issue a replace by running the CLI command:
dcos couchbase --name=couchbase pod replace <pod-name>
Pausing a pod relaunches it in an idle command state. This allows you to debug the contents of the pod, possibly making changes to fix problems, while still having access to all the context of the pod (such as volumes)
Using pause and
dcos task exec is a very powerful debugging tool. To pause a pod use the CLI command:
dcos couchbase --name=couchbase debug pod pause <pod-name>
To pause a specific task of a pod append the
-t <task-name> flag,
dcos couchbase --name=couchbase debug pod pause <pod-name> -t <task-name>
pod status command to check what tasks and pods are in an overridden state.
To resume a paused pod or task use the CLI command:
dcos couchbase --name=couchbase debug pod resume <pod-name> [-t <task-name>]
pod status command to verify that the pod (or task) has properly resumed.
The DC/OS Couchbase service pushes metrics into the DC/OS metrics system. Details on consuming them can be found at the documentation on the DC/OS metrics system.
DC/OS has three ways to access service scheduler and service task logs.
- Via the DC/OS GUI
- Via the Mesos GUI
- Via the DC/OS CLI
A service’s logs are accessed by selecting the service in the Services tab. Both the service scheduler and service tasks are displayed side by side in this view. To view the tasks sandbox (files) as well as its
stderr, click on a task.
The Mesos GUI provides similar access as that of the DC/OS UI, but the service scheduler is separate from the service tasks. The service tasks can all be found in the Frameworks tab under the framework with the same name as the service. The service scheduler can be found in the Marathon framework, it will be a task with the same name as the service.
Access both the files and logs of a service by clicking on its
Figure 1 - Scheduler sandbox
dcos task log subcommand allows you to pull logs from multiple tasks at the same time. The
dcos task log <task-pattern> command will fetch logs for all tasks matching the prefix pattern. See the help of the subcommand for full details of the command.
Mesos Agent logs
Occasionally, it can also be useful to examine what a given Mesos agent is doing. The Mesos Agent handles deployment of Mesos tasks to a given physical system in the cluster. One Mesos Agent runs on each system. These logs can be useful for determining if there is a problem at the system level that is causing alerts across multiple services on that system.
Mesos agent logs can be accessed via the Mesos GUI or directly on the agent. The GUI method is described here.
Navigate to the agent you want to view either directly from a task by clicking the “Agent” item in the breadcrumb when viewing a task (this will go directly to the agent hosting the task), or by navigating through the “Agents” menu item at the top of the screen (you will need to select the desired agent from the list).
In the Agent view, you will see a list of frameworks with a presence on that Agent. In the left pane you will see a plain link named “LOG”. Click that link to view the agent logs.
Figure 2 - List of frameworks
More on POD Replace
If a Couchbase Server node becomes unresponsive, you must failover to the remaining healthy nodes. Failover can be initiated via the Couchbase dashboard or the CLI. You can also configure auto failover. You can find more information on Couchbase failover here.
After failover, a rebalance must be initiated. Rebalance can be initiated via the Couchbase dashboard or the CLI. The rebalance will remove the unresponsive node from the cluster and distribute the data evenly across the remaining nodes.
You will then use the Couchbase Server
replace command to permanently replace the unresponsive node with a new one using the
pod replace command. You can find more on the
pod replace command in Pod Operations and in Troubleshooting.
dcos couchbase pod replace <pod-name>
Replace will create the new node and add it to the Couchbase cluster, except that node
data-0 will be created and must be added via the Couchbase dashboard or the CLI to the Couchbase cluster. If multiple nodes have to be replaced and
data-0 is among them, you should always start with
After a node is replaced, again a
rebalance is necessary.
Backup and Restore
For backup and restore, we leverage the
cbbackupmgr tool that comes with Couchbase Enterprise.
backupmgr service node must be launched; see the respective section in the DC/OS couchbase configuration. It provides the volume to store the incremental snapshots of the database, and provides tasks for the various
cbbackupmgr commands (
backupmgr node is set up with a connection to an AWS S3 (compatible) store (default is
minio). The tasks use the
aws s3 sync command to keep the incremental snapshots on the
backupmgr node and in the connected s3 bucket in sync.
backupmgr-backup creates a incremental snapshot and syncs it with the AWS S3 bucket.
dcos couchbase plan start backupmgr-backup
backupmgr-restore syncs with an AWS S3 bucket, then restores the backup. Empty Couchbase buckets have to be created before you attempt a restore.
dcos couchbase plan start backupmgr-restore
backupmgr-list lists the snapshots. You will find the snapshot list in the
sdtout log of the task, along with their timestamps. You will need the timestamps for the merge command.
dcos couchbase plan start backupmgr-list
backupmgr-merge allows you to merge snapshots together. Snapshots have a timestamp; you can get them using the
list command. After the merge is completed it is also synced with the AWS S3 bucket. After that, both only contain the merged snapshot.
dcos couchbase plan start backupmgr-merge -p MERGE_START=<start-time-stamp> -p MERGE_END=<end-time-stamp>