Given Docker’s propensity for creating easy to use tools it shouldn’t come as a surprise that Docker Swarm is one of the easier to understand and run of the “Docker Clustering” options currently out there. I recently built some Terraform configs for deploying a Highly Available Docker Swarm cluster on Openstack and learned a fair bit about Swarm in the process.
This guide is meant to be a platform agnostic howto on installing and running a Highly Available Docker Swarm to show you the ideas and concepts that may not be as easy to understand from just reading some config management code.
The reason for using CoreOS here is that to make Swarm run in High Availability mode as well as being able to support docker networking between hosts we need to use service discovery. We can choose to use
zookeeper here, CoreOS comes with
etcd thus makes it an excellent choice for running Docker Swarm.
You will need three servers capable of running CoreOS. See the “Try Out CoreOS” section of their website for various installation methods for different infrastructure. For this guide I will use the official CoreOS Vagrant Example.
skip the rest of this section if you install CoreOS for a different platform
Clone down the Vagrant example:
$ git clone https://github.com/coreos/coreos-vagrant.git vagrant-docker-swarm Cloning into 'vagrant-docker-swarm'... remote: Counting objects: 411, done. remote: Total 411 (delta 0), reused 0 (delta 0), pack-reused 411 Receiving objects: 100% (411/411), 100.33 KiB | 0 bytes/s, done. Resolving deltas: 100% (181/181), done. Checking connectivity... done. cd vagrant-docker-swarm
Vagrantfile to set
$num_instances = 3:
on Unix-like systems you can do this easily with sed
sed -i 's/\$num_instances = 1/\$num_instances = 3/' Vagrantfile
Get a new etcd discovery-url:
if you are on a windows box and don’t have curl you can paste the url into a web browser to get the discovery-url
$ curl https://discovery.etcd.io/new\?size\=3 https://discovery.etcd.io/6a9c62105f04dac40a29b90fbed322ef
Create a cloud-init file called
user-data in the base of the repo using the discovery-url from above:
#cloud-config coreos: etcd2: discovery: https://discovery.etcd.io/888fd1e440faf680a7abb3fd934da6fd advertise-client-urls: http://$public_ipv4:2379 initial-advertise-peer-urls: http://$public_ipv4:2380 listen-client-urls: http://0.0.0.0:2379,http://0.0.0.0:4001 listen-peer-urls: http://$public_ipv4:2380,http://$public_ipv4:7001 units: - name: etcd2.service command: start
Start up the CoreOS VMs and log into the first one to check everything worked ok:
$ vagrant up Bringing machine 'core-01' up with 'virtualbox' provider... Bringing machine 'core-02' up with 'virtualbox' provider... Bringing machine 'core-03' up with 'virtualbox' provider... ... $ vagrant ssh core-01 $ etcdctl member list 3c5901a3db54efa3: name=f1bae7bba7714ed7b4585c6b1256ddb2 peerURLs=http://172.17.8.101:2380 clientURLs=http://172.17.8.101:2379 9eeb141350af8439: name=5c8e57890d114d7d9d7aef662033a6e0 peerURLs=http://172.17.8.103:2380 clientURLs=http://172.17.8.103:2379 ebcc652087dfe6e8: name=de426249d3b34e23a5706d99b4900665 peerURLs=http://172.17.8.102:2380 clientURLs=http://172.17.8.102:2379
Now that we have several CoreOS servers with a working etcd cluster we can move on to setting up Docker Swarm.
We need to modify docker to listen on tcp port
2376 as well as registering itself to service discovery (which will allow us to set up overlay networking later on). We do this by creating a file
/etc/systemd/system/docker.service.d/ on each server.
if not using vagrant change
eth1 to match the primary interface for your server
[Service] Environment="DOCKER_OPTS=-H=0.0.0.0:2376 -H unix:///var/run/docker.sock --cluster-advertise eth1:2376 --cluster-store etcd://127.0.0.1:2379"
We then need to reload the
systemctl daemon and then restart docker for these changes to take effect.
sudo systemctl daemon-reload sudo systemctl restart docker
Check that you can access docker via tcp on one of your hosts:
$ docker -H tcp://172.17.8.101:2376 info Containers: 0 Images: 0 Engine Version: 1.9.1 Storage Driver: overlay Backing Filesystem: extfs Execution Driver: native-0.2 Logging Driver: json-file Kernel Version: 4.3.3-coreos Operating System: CoreOS 899.1.0 CPUs: 1 Total Memory: 997.4 MiB Name: core-01 ID: BK64:WF3J:5JU6:VYLI:YJSO:CAQH:HPYM:MPTG:FMTA:VLE3:HSMP:F4VQ Cluster store: etcd://127.0.0.1:2379/docker
We’re now ready to run Docker Swarm itself. There are two extra components to running Docker Swarm, a Swarm Agent and a Swarm Manager.
The Swarm Agent watches the local Docker service via it’s TCP port and registers it into service discovery (etcd in our case). We will run this on each server like so:
set the –addr= argument to match the primary IP of each node
$ docker run -d --name swarm-agent \ --net=host swarm:latest \ join --addr=172.17.8.101:2376 \ etcd://127.0.0.1:2379
The Swarm Manager watches service discovery and exposes a TCP port (2375) which when accessed by a Docker client will perform actions and schedule containers across the Swarm cluster.
To ensure High Availability of our cluster we’ll run a Swarm Manager on each server:
$ docker run -d --name swarm-manager --net=host swarm:latest manage \ etcd://127.0.0.1:2379
Assuming everything went smoothly we can now access the swarm cluster via the Swarm Managers TCP port on any of the servers:
$ docker -H tcp://172.17.8.101:2375 info Containers: 6 Images: 5 Role: primary Strategy: spread Filters: health, port, dependency, affinity, constraint Nodes: 3 core-01: 172.17.8.101:2376 └ Status: Healthy └ Containers: 2 └ Reserved CPUs: 0 / 1 └ Reserved Memory: 0 B / 1.023 GiB └ Labels: executiondriver=native-0.2, kernelversion=4.3.3-coreos, operatingsystem=CoreOS 899.1.0, storagedriver=overlay core-02: 172.17.8.102:2376 └ Status: Healthy └ Containers: 2 └ Reserved CPUs: 0 / 1 └ Reserved Memory: 0 B / 1.023 GiB └ Labels: executiondriver=native-0.2, kernelversion=4.3.3-coreos, operatingsystem=CoreOS 899.1.0, storagedriver=overlay core-03: 172.17.8.103:2376 └ Status: Healthy └ Containers: 2 └ Reserved CPUs: 0 / 1 └ Reserved Memory: 0 B / 1.023 GiB └ Labels: executiondriver=native-0.2, kernelversion=4.3.3-coreos, operatingsystem=CoreOS 899.1.0, storagedriver=overlay CPUs: 3 Total Memory: 3.068 GiB Name: core-01
Our next step is to create an overlay network using the
docker network command:
$ docker -H tcp://172.17.8.101:2375 network create --driver overlay my-net 614913b275dee43a63b48d08b4f5e52f7c0e531d70c63eeb8bb35624470da0c4 $ docker -H tcp://172.17.8.101:2375 network ls NETWORK ID NAME DRIVER 86ecb0cf32c6 core-02/none null c7a291ed8366 core-01/host host 3747364c5961 core-03/none null 8245d6d3ac67 core-02/host host 614913b275de my-net overlay 61ead145e9dd core-01/bridge bridge c9457c4f4588 core-03/bridge bridge b8a6c75cb3b9 core-03/host host bdc4d5ccd778 core-02/bridge bridge 66afdc892361 core-01/none null
Finally we’ll create a Container on one host and then check that it is accessible from another:
replace the node==XXXX argument with the hostname of one of your hosts, make sure to use a different node for each docker command
$ docker run -it --name=web --net=my-net \ -H tcp://172.17.8.101:2375 \ --env="constraint:node==core-01" nginx e0fe18c946a5692806608f939d4d6f31c670e3f42bf3942a77142bed2095983e $ docker run -it --rm --net=my-net \ -H tcp://172.17.8.101:2375 \ --env="constraint:node==core02" busybox wget -O- http://web Connecting to web (10.0.0.2:80) <!DOCTYPE html> <html> <head> <title>Welcome to nginx!</title>
If you’ve been following along you have successfully deployed a Highly Available Docker Swarm cluster. From here you could use a load balancer to load balance the Swarm Manager port (2375) or even use Round Robin DNS.
You may have notice there is no authentication or authorization on this and anybody with a Docker binary and TCP access to your hosts could spin up docker containers. This is fairly easily fixed by using Docker’s TLS cert based authorization.
To read how to secure both Docker and Docker Swarm with TLS read the followup post Secure Docker with TLS.