Skip to content

Deploy Docker Swarm on AWS EC2 via cloud-formation templates - Step 7 - Docker Swarm

In this step we will create a Docker Swarm cluster.

This post is part of a thread that includes these steps:

  1. Network Setup
  2. Storage
  3. Roles
  4. Manager Instance
  5. Worker Launch Template
  6. Worker Instances
  7. Docker Swarm (this post)
  8. Cleanup

Docker Swarm

Manager Machine

IMPORTANT: On the manager machine port 2377 (Docker Swarm) must be open / accessible from other nodes in order for them to be able to join the swarm

We are using Docker Swarm. At least one machine should be designated as a "manager". The "manager" machine is where all cluster managment operations will be done from, including also all the stack deployments.

The manager machine on AWS is named manager.

Login to manager machine

./ssh/ssh-manager.sh

Switch user to the worker user:

sudo su - worker

Clean Slate

It is recommended to start from clean Docker installation.

Warning: This will destroy all your images and containers. There is no way to restore them!

docker stop $(docker ps --all --quiet)
docker rm $(docker ps --all --quiet)
docker rmi $(docker images --quiet)

Docker Swarm

Init

Initialize Docker Swarm. This will make the machine Swarm manager:

export dev=eth0
export host_ip=`ip -4 -o addr show $dev | perl -lane 'print $F[3]' | cut -d/ -f1`
docker swarm init --advertise-addr $host_ip --default-addr-pool 192.168.0.0/16

Manager Node

The node on which we ran docker swarm init is now the manager node. The manager node is also the first worker node.

Worker Nodes

To add another worker to the swarm, run this and follow the instructions:

docker swarm join-token worker

To remove node from the swarm, login to the node and run:

docker swarm leave

pssh

mkdir -p ~/nodes
cd ~/nodes
Install parallel-ssh
cd ~/nodes
git clone https://github.com/nplanel/parallel-ssh
cd parallel-ssh
python3 setup.py install --user
Install pssh
cd ~/nodes
git clone https://github.com/lilydjwg/pssh
cd pssh
pip3 install --user .

Simple test:

pssh \
  --host=worker-1.swift.internal --option="StrictHostKeyChecking=no" \
  --user=$USER --inline 'uptime'

Test that you can execute a simple command on all hosts:

cd ~/nodes

# for all nodes
printf "%s\n" worker-{1..2} > hosts
pssh \
  --hosts=$HOME/nodes/hosts --option="StrictHostKeyChecking=no" \
  --user=$USER 'echo hi'

Add Nodes to Swarm

Run this on the manager machine. Copy the join command from the output:

docker swarm join-token worker

Paste the join command which should look similar to this:

docker swarm join --token <very-long-token> <ip-address>:2377

Use pssh (parallel ssh) to execute the join command on large number of nodes:

pssh --hosts=$HOME/nodes/hosts --inline \
'docker swarm join --token <very-long-token> <ip-address>:2377'    

Example (but note that your token will be different):

pssh --hosts=$HOME/nodes/hosts --option="StrictHostKeyChecking=no" --inline \
'docker swarm join --token SWMTKN-1-0nvgk6xtg0lcdcrb7a4mow6978g5tftb4wu5big52bbcpajq82-bq2j7uh6kos1s192h42a853so 10.0.10.175:2377'

Verify:

docker node ls

Congratulations!

We are done with Step 7. Docker Swarm.

Next step is: Step 8. Cleanup