High Availability Vault Cluster Setup with Consul

This tutorial will demonstrate how to create a high availability Vault cluster with Consul as back-end storage. Vault is mainly used to store passwords, keys, tokens, and certificates. To set up a high availability cluster we need 2 vault machines and 3 Consul machines, refer below diagram for more details.

Vault HA Architecture

From the HA Vault architecture, you can see one Vault node is Active and the other one is Passive or standby. The active instance will handle all the requests (reads and writes) and all standby nodes redirect requests to the active node.

We will be going to set up a cluster similar to the diagram. For this, provision 2 Vault machines and 3 Consul machines. I have provisioned all my machines in the Azure cloud. To provision multiple Virtual machines you can use Ansible or Terraform. Please refer to my older post about provisioning multiple VMs in the Azure cloud.

Provision using Ansible playbook

Provision using Terraform script

Server details

My 5 servers are up and running in the cloud. All servers are provisioned in the same VNET so they can communicate with each other using the private IP address or the VM name.

VM name Private IP address
vm-vault-1 50.1.0.5
vm-vault-2 50.1.0.6
vm-stage-consul-1 50.1.4.4
vm-stage-consul-2 50.1.4.5
vm-stage-consul-3 50.1.4.6

Vault and Consul installation steps

First, log in to the Consul machine and download the latest consul binary from the Hashi corp official page and move to /usr/local/bin/consul.

Step.1:

vm-stage-consul-1: sudo mkdir /usr/local/bin/consul -p
vm-stage-consul-1: wget https://releases.hashicorp.com/consul/1.9.1/consul_1.9.1_linux_amd64.zip
vm-stage-consul-1:unzip consul_1.9.1_linux_amd64.zip 
vm-stage-consul-1:sudo mv consul /usr/local/bin/consul
vm-stage-consul-1:/usr/local/bin/consul$ ls
consul

Step.2:

Create a Consul configuration file under /usr/local/etc/consul/consul_s1.json location as follows and create a data directory to store the data (/var/consul/data)

vm-stage-consul-1:mkdir -p /var/consul/data
vm-stage-consul-1:/usr/local/etc/consul$ cat consul_s1.json 
{
  "server": true,
  "node_name": "consul_s1",
  "datacenter": "dc1",
  "data_dir": "/var/consul/data",
  "bind_addr": "0.0.0.0",
  "client_addr": "0.0.0.0",
  "advertise_addr": "50.1.4.4",
  "bootstrap_expect": 3,
  "retry_join": ["50.1.4.4", "50.1.4.5", "50.1.4.6"],
  "ui": true,
  "log_level": "DEBUG",
  "enable_syslog": true,
  "acl_enforce_version_8": false
}

Notice that the server parameter is set to true to indicate that this instance will run in server mode.

Step.3:

Next, create a consul systemd unit file under /etc/systemd/system/consul.service and a PID file under /var/run/consul/consul-server.pid

sudo mkdir /var/run/consul
sudo touch /var/run/consul/consul-server.pid

$ cat /etc/systemd/system/consul.service 
### BEGIN INIT INFO
# Provides:          consul
# Required-Start:    $local_fs $remote_fs
# Required-Stop:     $local_fs $remote_fs
# Default-Start:     2 3 4 5
# Default-Stop:      0 1 6
# Short-Description: Consul agent
# Description:       Consul service discovery framework
### END INIT INFO

[Unit]
Description=Consul server agent
Requires=network-online.target
After=network-online.target

[Service]
PIDFile=/var/run/consul/consul-server.pid
PermissionsStartOnly=true
ExecStart=/usr/local/bin/consul/consul agent \
    -config-file=/usr/local/etc/consul/consul_s1.json \
    -pid-file=/var/run/consul/consul-server.pid
ExecReload=/bin/kill -HUP $MAINPID
KillMode=process
KillSignal=SIGTERM
Restart=on-failure
RestartSec=42s

[Install]
WantedBy=multi-user.target

Perform step1 to 3 on remaining Consul servers. Please change the following parameters as follows.

file name: consul_s2.json, consul_s3.json etc

Node_name: consul_s2 (consul_s2.json file)

advertise_addr: Changed according to the server’s private IP address.

The config file location should be changed based on your config file name in the consul.service file.

Once the Consul binary has installed on all nodes, execute the below commands to enable the service on all machines.

vm-stage-consul-1:~$ sudo systemctl daemon-reload
vm-stage-consul-1:~$ sudo systemctl restart consul
vm-stage-consul-1:~$ sudo systemctl status consul
 consul.service - Consul server agent

   Loaded: loaded (/etc/systemd/system/consul.service; disabled; vendor preset: enabled)
   Active: active (running) since Thu 2021-06-03 15:43:46 UTC; 6s ago
 Main PID: 4984 (consul)
    Tasks: 7 (limit: 2263)
   CGroup: /system.slice/consul.service
           └─4984 /usr/local/bin/consul/consul agent -config-file=/usr/local/etc/consul/consul_s1.json -pid-file=/var/run/consul/consul-server.pid
Jun 03 15:43:47 vm-stage-consul-1 consul[4984]:     2021-06-03T15:43:47.831Z [DEBUG] agent.server.serf.wan: serf: messageJoinType: consul_s1.dc1

To confirm all our Consul servers are up and running execute the below command on server 1.

vm-stage-consul-1:~$ /usr/local/bin/consul/consul members

Node       Address        Status  Type    Build  Protocol  DC   Segment
consul_s1  50.1.4.4:8301  alive   server  1.9.1  2         dc1  <all>
consul_s2  50.1.4.5:8301  alive   server  1.9.1  2         dc1  <all>
consul_s3  50.1.4.6:8301  alive   server  1.9.1  2         dc1  <all>

Here we can see all our services are running fine.

Set up consul client agents on vault nodes

Step.4:

To install Consul client agent in Vault nodes login to the first vault server and execute the below commands. Download the binary file.

vm-vault-1:~$ wget https://releases.hashicorp.com/consul/1.9.1/consul_1.9.1_linux_amd64.zip
vm-vault-1:~$ unzip consul_1.9.1_linux_amd64.zip                   
vm-vault-1:~$ sudo  mkdir -p /usr/local/bin/consul 
vm-vault-1:~$ sudo mv consul /usr/local/bin/consul/

Create a configuration file as follows. /usr/local/etc/consul/consul_c1.json. Please note here I used c1 because this is a Consul client service.

vm-vault-1:~$ sudo mkdir /usr/local/etc/consul
vm-vault-1:~$ sudo vim /usr/local/etc/consul/consul_c1.json
{
  "server": false,
  "datacenter": "dc1",
  "node_name": "consul_c1",
  "data_dir": "/var/consul/data",
  "bind_addr": "50.1.0.5",
  "client_addr": "127.0.0.1",
  "retry_join": ["50.1.4.4", "50.1.4.5", "50.1.4.6"],
  "log_level": "DEBUG",
  "enable_syslog": true,
  "acl_enforce_version_8": false
}

Here “bind_addr” is the private IP address of the Vault server and not the Consul server IP address. Add all Consul server IPs under the “retry_join” field.

Copy and paste the systemd unit file to /etc/systemd/system/consul.service location and start the consul service as performed earlier. Follow Step.3 commands.

Log in to the second Vault server and perform the same steps. Don’t forget to change the below properties.

Filename: consul_c2.json

node_name: consul_c2

bind_addr:<second vault address>

All the changes related to Consul have been completed. Next, configure the Vault server.

Configure the vault server

Step.5:

Download  vault binary from the Hashicorp website and save it to the /usr/local/bin/ location

vm-vault-1:~$ wget https://releases.hashicorp.com/vault/1.6.1/vault_1.6.1_linux_amd64.zip
unzip vault_1.6.1_linux_amd64.zip 
sudo mv vault /usr/local/bin/
sudo mkdir /etc/vault/
sudo vim /etc/vault/vault_server.hcl

listener "tcp" {
  address          = "0.0.0.0:8200"
  cluster_address  = "50.1.0.5:8201"
  tls_disable      = "true"
}
storage "consul" {
  address = "127.0.0.1:8500"
  path    = "vault/"
}

api_addr = "http://50.1.0.5:8200"
cluster_addr = "https://50.1.0.5:8201"

Once the configuration file is created, next create a system unit file under /etc/systemd/system/vault.service location as follows. Please make sure that the PID file is in place /var/run/vault/vault.pid.

vm-vault-1:~$ cat /etc/systemd/system/vault.service

### BEGIN INIT INFO
# Provides:          vault
# Required-Start:    $local_fs $remote_fs
# Required-Stop:     $local_fs $remote_fs
# Default-Start:     2 3 4 5
# Default-Stop:      0 1 6
# Short-Description: Vault server
# Description:       Vault secret management tool
### END INIT INFO

[Unit]
Description=Vault secret management tool
Requires=network-online.target
After=network-online.target

[Service]
PIDFile=/var/run/vault/vault.pid
ExecStart=/usr/local/bin/vault server -config=/etc/vault/vault_server.hcl -log-level=debug

ExecReload=/bin/kill -HUP $MAINPID
KillMode=process
KillSignal=SIGTERM
Restart=on-failure
RestartSec=42s
LimitMEMLOCK=infinity

[Install]
WantedBy=multi-user.target

Reload the service daemon and start the vault service.

sudo systemctl start vault
sudo systemctl status vault

 vault.service - Vault secret management tool
   Loaded: loaded (/etc/systemd/system/vault.service; disabled; vendor preset: enabled)
   Active: active (running) since Thu 2021-06-03 17:01:27 UTC; 6s ago
 Main PID: 19648 (vault)
    Tasks: 6 (limit: 4074)
   CGroup: /system.slice/vault.service
           └─19648 /usr/local/bin/vault server -config=/etc/vault/vault_server.hcl -log-level=debug
Jun 03 17:01:28 vm-vault-1 vault[19648]: 2021-06-03T17:01:28.023Z [DEBUG] storage.consul: config path set: path=vault/

Perform Step.5 on Vault server 2 as well. If you could start the Vault service without any issue then we have completed the Vault and Consul installation. Login to Consul server 1 and execute the below command to make sure that all the nodes are up.

vm-stage-consul-1:~$ /usr/local/bin/consul/consul members

Node       Address        Status  Type    Build  Protocol  DC   Segment
consul_s1  50.1.4.4:8301  alive   server  1.9.1  2         dc1  <all>
consul_s2  50.1.4.5:8301  alive   server  1.9.1  2         dc1  <all>
consul_s3  50.1.4.6:8301  alive   server  1.9.1  2         dc1  <all>
consul_c1  50.1.0.5:8301  alive   client  1.9.1  2         dc1  <default>
consul_c2  50.1.0.6:8301  alive   client  1.9.1  2         dc1  <default>

We have completed the Vault and Consul installation. Next, initialize the vault by executing the below command

Export the vault address

vm-vault-1:~$ export VAULT_ADDR='http://127.0.0.1:8200'

vm-vault-1:~$ vault operator init
Unseal Key 1: KyHzE+WPqgN759d7hXNiEK2DJUIlgW1H7KvpiSdGjfmF
Unseal Key 2: N839Ijnn7KvtbFC8NrBS4alwFmO6w5b1rXLPFR7c1fcg
Unseal Key 3: idzt+yxUuVofVWrENX3mlb64VPgIOoixqsk8QU3fr00w
Unseal Key 4: igeYuAaXE84F78cH0ZXqxMnR5qsjJ0DVpGBFlYfQuskk
Unseal Key 5: qes2c+iHWlVVJekdY6tCeUtU3T/fKUAYhlLDn04o9n6A

Initial Root Token: s.f4ywih3cFm6uRg7sqM5Kj9mg

The above command generates 5 vaults unseal key and a Root token. Now that you have successfully initialized Vault, go ahead and unseal the first vault server

vm-vault-1:~$ vault operator unseal KyHzE+WPqgN759d7hXNiEK2DJUIlgW1H7KvpiSdGjfmF

Key                Value
---                -----
Seal Type          shamir
Initialized        true
Sealed             true
Total Shares       5
Threshold          3
Unseal Progress    1/3
Unseal Nonce       97605fdd-3942-9700-d0f7-0cb3b175bdc6
Version            1.6.1
Storage Type       consul
HA Enabled         true

Repeat the “vault operator unseal” command with the 2 more keys to unseal the vault. Once done execute the “vault status” command to see that status of vault.

vm-vault-1:~$ vault status

Key             Value
---             -----
Seal Type       shamir
Initialized     true
Sealed          false
Total Shares    5
Threshold       3
Version         1.6.1
Storage Type    consul
Cluster Name    vault-cluster-3f751a14
Cluster ID      6deb80f5-ec50-99a3-8a50-f9ab6c293920
HA Enabled      true
HA Cluster      https://50.1.0.5:8201
HA Mode         active

Here we can see vm-vault-1 is active, unseal the second vault server using the same key and execute vault status. The second server will be in standby mode.

vm-vault-2:~$ vault status

Key                    Value
---                    -----
Seal Type              shamir
Initialized            true
Sealed                 false
Total Shares           5
Threshold              3
Version                1.6.1
Storage Type           consul
Cluster Name           vault-cluster-3f751a14
Cluster ID             6deb80f5-ec50-99a3-8a50-f9ab6c293920
HA Enabled             true
HA Cluster             https://50.1.0.5:8201
HA Mode                standby
Active Node Address    http://50.1.0.5:8200

Vault servers are now active in HA mode.

Reference:

Hashicorp reference page

Leave a Reply

Your email address will not be published. Required fields are marked *