DC/OS Version 1.11.9 was released on January 31, 2019.
DC/OS 1.11.9 includes the following components:
- Apache Mesos 1.5.2 change log.
- Marathon 1.6.567 change log.
- Metronome 0.4.5 change log.
Release Summary
DC/OS is a distributed operating system that enables you to manage resources, application deployment, data services, networking, and security in an on-premise, cloud, or hybrid cluster environment.
Issues Fixed in DC/OS 1.11.9
The issues that have been fixed in DC/OS 1.11.9 are grouped by feature, functional area, or component. Most change descriptions include one or more issue tracking identifiers for reference.
GUI
- DCOS-45803 - The multi-container JSON editor deletes properties that are not supported by the form. When you run a Pod, the UI deletes
backoff
,upgrade
,killSelection
, andunreachableStrategy
fields from the JSON app definition before sending the request to the server.
Installing
- DCOS_OSS-4040 - The
dcos-diagnostic
bundle files are stored in/var/lib/dcos/dcos-diagnostics/diag-bundles
and/var/lib/dcos
is not a separate partition. It belongs to the root directory and there are many other critical data in the same directory on the masters, which could seriously damage a cluster if diagnostics bundle takes most of the space. The workaround is to add a configuration option that can be used to specify where to store diagnostic bundles on the masters.
Mesos
- COPS-4320, DCOS-46753 - This issue occurs due to a race between
.discard()
triggered by check containerTIMEOUT
andIOSB
extractingContainerIO
object. This race could be exposed by overloaded/slow agent process if there are frequent check containers launched on an agent with heavy loads. This release fixes the issue by discardinglaunch
, so the container I/O is cleaned up and therefore all FDs are closed. - DCOS-29474 - Occasionally flakes in
test_srv_records
are traced down to frameworks that are not correctly unregistered. This release validates that the framework is correctly unregistered, and throws an exception (triggering log collection) if the check fails after an uninstall. - DCOS-46388 - If there is a validation error for
LAUNCH_GROUP
, or if there are multiple authorization errors for some of the tasks in aLAUNCH_GROUP
, the master skips to process the remaining authorization results. - DCOS-46814 - When an agent host reboots, all of its containers fail but the agent will try to recover from its checkpoint state after reboot. The agent will soon discover that all the
cgroup
hierarchies have failed and assume that the containers are destroyed. However, when trying to terminate the executor, the agent will first try to wait for theexit
status of its container bywaitpid
on the checkpointed child processpid
. If an agent host reboots and a new process with the samepid
gets spawned then the parent waits for the wrong child process. This process will block the executor termination and future task status updates.
Metrics
- DCOS_OSS-3863 - This release fixes a bug in
dcos-metrics
that caused Prometheus exporter to omit some metrics data on the agent nodes.
Networking
- COPS-3743, COPS-4323, DCOS_OSS-4620 - Erlang has a concept of cookie which allows/disallows a node to make a connection with the other nodes in the cluster. Only nodes with the same cookie strings are allowed to make connections. Currently, this cookie is a fixed string and not configurable in DC/OS. If nodes from different DC/OS clusters are reachable to each other then there is a possibility of “cross talk” between the two clusters. This fix is to make the cookie configurable.
- DCOS-40539, DCOS-46506 - Currently, the configuration option
enable_ipv6
is not passed to the frontend. The UI is not aware that the cluster does not support thedcos6
network type. This release passes the configuration option to the frontend configuration file and uses it to conditionally show thedcos6
option for the network types. Therefore, this issue fixes the validation of overlay backends by markingdcos6
overlay network as disabled, ifenable_ipv6
is set to false. - DCOS-46915 - In OTP 20 and earlier, all Erlang distribution protocol connections over TLS are initialized by
ssl_tls_dist_proxy
one at a time. This approach causes a bottleneck and is resolved in the newest OTP versions. If the node connects to a non-existing node, it takes up to 30 seconds to get an error. Lashup tries to connect to such nodes everyn
seconds which causes a message storm inssl_tls_dist_proxy
. The workaround is to restart the entire virtual machine, if there are more thanm
messages in the queue. It is recommended not to kill thessl_tls_dist_proxy
since it will break all new and old distribution protocol connections. - DCOS_OSS-4667 - Mesos recently introduced a flag to toggle CNI root directory to persist across reboot. Now, you can expose the Mesos flag through DC/OS config.yaml file.
Package Management
- DCOS_OSS-4418 - The requests package before 2.20.0 for python sends a
HTTP
authorization header to aHTTP URI
upon receiving a same host namehttps-to-http
redirect, which makes it easier for remote attackers to discover credentials by sniffing the network. This causes a vulnerability in the requests library necessitating an upgrade. This release upgrades the version of the requests library to an updated and more secure version 2.20.1 and urllib3 to 1.24.1.
Known Issues and Limitations
This section covers any known issues or limitations that don’t necessarily affect all customers, but might require changes to your environment to address specific scenarios. The issues are grouped by feature, functional area, or component. Where applicable, issue descriptions include one or more issue tracking identifiers.
Mesos
- DCOS-44935, DCOS_OSS_3877, DCOS_OSS-4658 - The diagnostics bundle is the standard way of collecting the debugging information from a cluster to debug and fix critical issues on customer sites. Most of the powerful debugging information is lost since the diagnostic bundle is not created as the root user and network information is not collected as part of the diagnostic bundle. To resolve this issue, create
dcos-diagnostics
as the root user which will collect the best possible diagnostic information in order to effectively debug issues.
About DC/OS 1.11
DC/OS 1.11 includes many new capabilities with a focus on:
- Managing clusters across multiple clouds. Enterprise
- Production Kubernetes-as-a-service.
- Enhanced data security. Enterprise
- Updated data services.
Provide feedback on the new features and services at support.mesosphere.com.
New Features and Capabilities in DC/OS 1.11
Platform
- Multi-region management - Enables a DC/OS cluster to span multiple datacenters, clouds, and remote branches while providing a unified management and control cluster. View the documentation. Enterprise
- Linked clusters - A cluster link is a unidirectional relationship between one cluster and another. You can add and remove links from one cluster to another cluster using the DC/OS CLI. Once a link is set up, you can easily switch between clusters using the CLI or UI. View the documentation. Enterprise
- Fault domain awareness - Use fault domain awareness to make your services highly available and to allow for increased capacity when needed. View the documentation. Enterprise
- Decommission nodes - Support for permanently decommissioning nodes makes it easier to manage
spot
cloud instances, allowing for immediate task rescheduling. View the documentation - UCR
- Support for Docker image garbage collection. View the documentation.
- Support for Docker image pull secrets. View the documentation. An example for Docker credentials is here. Enterprise
Networking
- Edge-LB 1.0. View the documentation. Enterprise
- IPv6 is now supported for Docker containers.
- Performance improvements to the DC/OS network stack - All networking components (minuteman, navstar, spartan) are aggregated into a single systemd unit called
dcos-net
. Read this note to learn more about the re-factoring of the network stack. - The configuration parameter
dns_forward_zones
now takes a list of objects instead of nested lists (DCOS_OSS-1733). View the documentation to understand its usage.
Enterprise
Security- Secrets Management Service
- Secrets can now be binary files in addition to environment variables.
- Hierarchical access control is now supported.
Monitoring
- The DC/OS metrics component now produces metrics in Prometheus format. View the documentation.
- Unified logging API provides simple access to container (task) and system component logs. View the documentation.
Storage
- DC/OS Storage Service 0.1 (beta) - DSS users will be able to dynamically create volumes based upon profiles or policies to fine-tune their applications storage requirements. This feature leverages the industry-standard Container Storage Interface (CSI) to streamline the development of storage features in DC/OS by Mesosphere and our community and partner ecosystems. View the documentation.Enterprise
- Pods now support persistent volumes. View the documentation.Beta
Updated DC/OS Data Services
- TLS encryption for DC/OS Kafka, DC/OS Cassandra, DC/OS Elastic, and DC/OS HDFS is now supported. Enterprise
- Fault domain awareness for DC/OS Kafka, DC/OS Cassandra, DC/OS Elastic and DC/OS HDFS. Use fault domain awareness to make your services highly available and to allow for increased capacity when needed. Enterprise
- New API endpoint to pause a node for DC/OS Kafka, DC/OS Cassandra, DC/OS Elastic, and DC/OS HDFS. Use this endpoint to relaunch a node in an idle command state for debugging purposes.
- New DC/OS Kafka ZooKeeper service. View the documentation.
- You can now select a DC/OS data service version from a dropdown menu in the DC/OS UI.
- Improved scalability for all DC/OS data services.