Skip to main content

Setup HashiCorp Vault HA Cluster with Integrated Storage (Raft) and AWS KMS Auto Unseal on CentOS 7

Vault is a tool for securely accessing secrets. A secret is anything that you want to tightly control access to, such as API keys, passwords, or certificates. Vault provides a unified interface to any secret, while providing tight access control and recording a detailed audit log.

A modern system requires access to a multitude of secrets: database credentials, API keys for external services, credentials for service-oriented architecture communication, etc. Understanding who is accessing what secrets is already very difficult and platform-specific. Adding on key rolling, secure storage, and detailed audit logs is almost impossible without a custom solution. This is where Vault steps in.

The key features of Vault are:

  • Secure Secret Storage: Arbitrary key/value secrets can be stored in Vault. Vault encrypts these secrets prior to writing them to persistent storage, so gaining access to the raw storage isn't enough to access your secrets. Vault can write to disk, Consul, and more.

  • Dynamic Secrets: Vault can generate secrets on-demand for some systems, such as AWS or SQL databases. For example, when an application needs to access an S3 bucket, it asks Vault for credentials, and Vault will generate an AWS keypair with valid permissions on demand. After creating these dynamic secrets, Vault will also automatically revoke them after the lease is up.

  • Data Encryption: Vault can encrypt and decrypt data without storing it. This allows security teams to define encryption parameters and developers to store encrypted data in a location such as SQL without having to design their own encryption methods.

  • Leasing and Renewal: All secrets in Vault have a lease associated with them. At the end of the lease, Vault will automatically revoke that secret. Clients are able to renew leases via built-in renew APIs.

  • Revocation: Vault has built-in support for secret revocation. Vault can revoke not only single secrets, but a tree of secrets, for example all secrets read by a specific user, or all secrets of a particular type. Revocation assists in key rolling as well as locking down systems in the case of an intrusion.

1. Architecture Diagram#

2. System Requirements#

2.1. HashiCorp Vault Nodes#

ComponentDescription
Number of VMs3
CPU2 Cores
Memory4 GB
Disk Size20 GB SSD
Operating SystemCentOS 7 x64
File SystemXFS
PrivilegesROOT access prefered

2.2. IP Allocation#

ComponentDescription
VM IPs192.168.15.101 - 192.168.15.103
Virtual IP (Floating IP)192.168.15.100 (For on-premises deployments ONLY)

2.3. DNS Entries#

IPHostnameFQDN
192.168.15.100vaultvault.example.com
192.168.15.101vault-1vault-1.example.com
192.168.15.102vault-2vault-2.example.com
192.168.15.103vault-3vault-3.example.com

2.4. AWS KMS Information#

important

The provided AWS credentials must have permissions to perform kms:DescribeKey, kms:Encrypt, and kms:Decrypt actions on the given KMS ARN.

ComponentDescription
AWS_ACCESS_KEY_IDAKIAUF222X2TAMFCVONW
AWS_SECRET_ACCESS_KEYSldc6f7CC5itOcujIbzQAkoa6YdP4T84vbN0m+Rr
AWS_REGIONeu-west-2
KMS_KEY_ID3t6265bd-31c0-456d-a4cb-a4d24dc28c1d
KMS ARNarn:aws:kms:eu-west-2:285216434864:key/3t6265bd-31c0-456d-a4cb-a4d24dc28c1d

3. Install and Configure a HashiCorp Vault HA Cluster#

3.1. Install prerequisites on ALL nodes#

3.1.1. Set server hostname.

# Example:# sudo hostnamectl set-hostname vault-1.example.com
sudo hostnamectl set-hostname <hostname>

3.1.2. Install prerequisites.

# Clean YUM repository cachesudo yum clean all
# Update packagessudo yum update -y
# Install prerequisitessudo yum install -y vim unzip curl ntp chrony net-tools yum-utils policycoreutils-python

3.1.3. Synchronize server time with default NTP servers. If you have your own NTP servers, please make sure to update the /etc/chrony.conf

# Set timezone to Asia/Colombosudo timedatectl set-timezone Asia/Colombo
# Enable NTP time synchronizationsudo timedatectl set-ntp true

3.1.4. Start and enable chronyd service.

# Start and enable chronyd servicesudo systemctl enable --now chronyd
# Verify if the service is startedsudo systemctl status chronyd

3.1.5. Display time synchronization status.

# Verify synchronisation statesudo ntpstat
# Check Chrony Source Statisticssudo chronyc sourcestats -v

3.1.6. Disable File Access Time Logging and enable Combat Fragmentation to enhance XFS file system performance. Add noatime,nodiratime,allocsize=64m to all XFS volumes under /etc/fstab.

# Edit /etc/fstabsudo vim /etc/fstab
# Modify XFS volume entries as follows# Example:UUID="03c97344-9b3d-45e2-9140-cbbd57b6f085"  /  xfs  defaults,noatime,nodiratime,allocsize=64m  0 0

3.1.7. Tweaking the system for high concurrency and security.

cat <<"EOF" | sudo tee /etc/sysctl.d/00-sysctl.conf > /dev/null############################################################################################## Tweak virtual memory#############################################################################################
# Default: 30# 0 - Never swap under any circumstances.# 1 - Do not swap unless there is an out-of-memory (OOM) condition.vm.swappiness = 30
# vm.dirty_background_ratio is used to adjust how the kernel handles dirty pages that must be flushed to disk.# Default value is 10.# The value is a percentage of the total amount of system memory, and setting this value to 5 is appropriate in many situations.# This setting should not be set to zero.vm.dirty_background_ratio = 5
# The total number of dirty pages that are allowed before the kernel forces synchronous operations to flush them to disk# can also be increased by changing the value of vm.dirty_ratio, increasing it to above the default of 30 (also a percentage of total system memory)# vm.dirty_ratio value in-between 60 and 80 is a reasonable number.vm.dirty_ratio = 60
# vm.max_map_count will calculate the current number of memory-mapped files.# The minimum value for mmap limit (vm.max_map_count) is the number of open files ulimit (cat /proc/sys/fs/file-max).# map_count should be around 1 per 128 KB of system memory. Therefore, max_map_count will be 262144 on a 32 GB system.# Reference: https://docs.confluent.io/current/kafka/deployment.html# Default: 65530vm.max_map_count = 2097152
############################################################################################## Tweak file descriptors#############################################################################################
# Increases the size of file descriptors and inode cache and restricts core dumps.fs.file-max = 2097152fs.suid_dumpable = 0
############################################################################################## Tweak network settings#############################################################################################
# Default amount of memory allocated for the send and receive buffers for each socket.# This will significantly increase performance for large transfers.net.core.wmem_default = 25165824net.core.rmem_default = 25165824
# Maximum amount of memory allocated for the send and receive buffers for each socket.# This will significantly increase performance for large transfers.net.core.wmem_max = 25165824net.core.rmem_max = 25165824
# In addition to the socket settings, the send and receive buffer sizes for# TCP sockets must be set separately using the net.ipv4.tcp_wmem and net.ipv4.tcp_rmem parameters.# These are set using three space-separated integers that specify the minimum, default, and maximum sizes, respectively.# The maximum size cannot be larger than the values specified for all sockets using net.core.wmem_max and net.core.rmem_max.# A reasonable setting is a 4 KiB minimum, 64 KiB default, and 2 MiB maximum buffer.net.ipv4.tcp_wmem = 20480 12582912 25165824net.ipv4.tcp_rmem = 20480 12582912 25165824
# Increase the maximum total buffer-space allocatable# This is measured in units of pages (4096 bytes)net.ipv4.tcp_mem = 65536 25165824 262144net.ipv4.udp_mem = 65536 25165824 262144
# Minimum amount of memory allocated for the send and receive buffers for each socket.net.ipv4.udp_wmem_min = 16384net.ipv4.udp_rmem_min = 16384
# Enabling TCP window scaling by setting net.ipv4.tcp_window_scaling to 1 will allow# clients to transfer data more efficiently, and allow that data to be buffered on the broker side.net.ipv4.tcp_window_scaling = 1
# Increasing the value of net.ipv4.tcp_max_syn_backlog above the default of 1024 will allow# a greater number of simultaneous connections to be accepted.net.ipv4.tcp_max_syn_backlog = 10240
# Increasing the value of net.core.netdev_max_backlog to greater than the default of 1000# can assist with bursts of network traffic, specifically when using multigigabit network connection speeds,# by allowing more packets to be queued for the kernel to process them.net.core.netdev_max_backlog = 65536
# Increase the maximum amount of option memory buffersnet.core.optmem_max = 25165824
# Number of times SYNACKs for passive TCP connection.net.ipv4.tcp_synack_retries = 2
# Allowed local port range.net.ipv4.ip_local_port_range = 2048 65535
# Protect Against TCP Time-Wait# Default: net.ipv4.tcp_rfc1337 = 0net.ipv4.tcp_rfc1337 = 1
# Decrease the time default value for tcp_fin_timeout connectionnet.ipv4.tcp_fin_timeout = 15
# The maximum number of backlogged sockets.# Default is 128.net.core.somaxconn = 4096
# Turn on syncookies for SYN flood attack protection.net.ipv4.tcp_syncookies = 1
# Avoid a smurf attacknet.ipv4.icmp_echo_ignore_broadcasts = 1
# Turn on protection for bad icmp error messagesnet.ipv4.icmp_ignore_bogus_error_responses = 1
# Enable automatic window scaling.# This will allow the TCP buffer to grow beyond its usual maximum of 64K if the latency justifies it.net.ipv4.tcp_window_scaling = 1
# Turn on and log spoofed, source routed, and redirect packetsnet.ipv4.conf.all.log_martians = 1net.ipv4.conf.default.log_martians = 1
# Tells the kernel how many TCP sockets that are not attached to any# user file handle to maintain. In case this number is exceeded,# orphaned connections are immediately reset and a warning is printed.# Default: net.ipv4.tcp_max_orphans = 65536net.ipv4.tcp_max_orphans = 65536
# Do not cache metrics on closing connectionsnet.ipv4.tcp_no_metrics_save = 1
# Enable timestamps as defined in RFC1323:# Default: net.ipv4.tcp_timestamps = 1net.ipv4.tcp_timestamps = 1
# Enable select acknowledgments.# Default: net.ipv4.tcp_sack = 1net.ipv4.tcp_sack = 1
# Increase the tcp-time-wait buckets pool size to prevent simple DOS attacks.# net.ipv4.tcp_tw_recycle has been removed from Linux 4.12. Use net.ipv4.tcp_tw_reuse instead.net.ipv4.tcp_max_tw_buckets = 1440000net.ipv4.tcp_tw_reuse = 1
# The accept_source_route option causes network interfaces to accept packets with the Strict Source Route (SSR) or Loose Source Routing (LSR) option set. # The following setting will drop packets with the SSR or LSR option set.net.ipv4.conf.all.accept_source_route = 0net.ipv4.conf.default.accept_source_route = 0
# Turn on reverse path filteringnet.ipv4.conf.all.rp_filter = 1net.ipv4.conf.default.rp_filter = 1
# Disable ICMP redirect acceptancenet.ipv4.conf.all.accept_redirects = 0net.ipv4.conf.default.accept_redirects = 0net.ipv4.conf.all.secure_redirects = 0net.ipv4.conf.default.secure_redirects = 0
# Disables sending of all IPv4 ICMP redirected packets.net.ipv4.conf.all.send_redirects = 0net.ipv4.conf.default.send_redirects = 0
# Disable IP forwarding.# IP forwarding is the ability for an operating system to accept incoming network packets on one interface, # recognize that it is not meant for the system itself, but that it should be passed on to another network, and then forwards it accordingly.net.ipv4.ip_forward = 0
# Disable IPv6net.ipv6.conf.all.disable_ipv6 = 1net.ipv6.conf.default.disable_ipv6 = 1
############################################################################################## Tweak kernel parameters#############################################################################################
# Address Space Layout Randomization (ASLR) is a memory-protection process for operating systems that guards against buffer-overflow attacks.# It helps to ensure that the memory addresses associated with running processes on systems are not predictable,# thus flaws or vulnerabilities associated with these processes will be more difficult to exploit.# Accepted values: 0 = Disabled, 1 = Conservative Randomization, 2 = Full Randomizationkernel.randomize_va_space = 2
# Allow for more PIDs (to reduce rollover problems)kernel.pid_max = 65536EOF

3.1.8. Reload all sysctl variables without rebooting the server.

sudo sysctl -p /etc/sysctl.d/00-sysctl.conf

3.1.9. Create Local DNS records.

cat <<"EOF" | sudo tee /etc/hosts > /dev/null# localhost127.0.0.1     localhost        localhost.localdomain
# When DNS records are updated in the DNS server, remove these entries.192.168.15.101 vault-1  vault-1.example.com192.168.15.102 vault-2  vault-2.example.com192.168.15.103 vault-3  vault-3.example.comEOF

3.1.10. The servers need to be restarted before continuing further.

sudo reboot

3.2. HashiCorp Vault common configurations on ALL nodes#

3.2.1. Configure YUM repository for Vault.

# Configure YUM repositorysudo yum-config-manager --add-repo https://rpm.releases.hashicorp.com/RHEL/hashicorp.repo

3.2.2. Install Vault.

# Install HashiCorp Vaultsudo yum -y install vault

3.2.3. Enable command auto-completion.

# Enable command auto-completionvault -autocomplete-install
complete -C /usr/bin/vault vault

3.2.4. Copy the correct SSL certificate, SSL key and the CA certificate files under /opt/vault/tls

important
  • To generate a custom CA and CA signed SSL certificates, please follow this guide.
  • Other than the *.example.com domain, you must have 'localhost' as a Subject Alternative Name (SAN) since keepalived uses it for health checks.
# Copy the correct CA certificate, TLS certificate and TLS key/opt/vault/tls/ca.pem/opt/vault/tls/tls.crt/opt/vault/tls/tls.key
# Set correct permissionssudo chown -R vault:vault /opt/vault/tlssudo chmod 0600 /opt/vault/tls/*
# Restore SELinux contextsudo restorecon -RvF /opt/vault/tls
# If you are using a self-signed certificate, make sure to configure it as a trusted root certificatesudo cp /opt/vault/tls/ca.pem /etc/pki/ca-trust/source/anchors/ca.pemsudo update-ca-trust

3.2.5. Open necessary firewall ports.

sudo firewall-cmd --permanent --add-port={8200,8201}/tcpsudo firewall-cmd --reload

3.2.6. Configure AWS KMS Auto Unseal

important

If you are running Vault on an AWS EC2 instance, make sure to create an AWS role with the following policy and attach it to all Vault running EC2 nodes.

# Create an AWS role with the following policy{    "Version": "2012-10-17",    "Statement": {        "Effect": "Allow",        "Action": [            "kms:DescribeKey",            "kms:Encrypt",            "kms:Decrypt"        ],        "Resource": [            "arn:aws:kms:eu-west-2:285216434864:key/3t6265bd-31c0-456d-a4cb-a4d24dc28c1d"        ]    }}
important

If you are running Vault on an ON-PREMISE instance, make sure to create an IAM user with the following policy and configure a systemd drop-in with necessary environment variables.

# Create an IAM user with the following policy{    "Version": "2012-10-17",    "Statement": {        "Effect": "Allow",        "Action": [            "kms:DescribeKey",            "kms:Encrypt",            "kms:Decrypt"        ],        "Resource": [            "arn:aws:kms:eu-west-2:285216434864:key/3t6265bd-31c0-456d-a4cb-a4d24dc28c1d"        ]    }}
# Create the vault.service.d directorysudo mkdir -p /etc/systemd/system/vault.service.d
# Create a systemd drop-in to apply necessary environment variablescat <<"EOF" | sudo tee /etc/systemd/system/vault.service.d/override.conf > /dev/null[Service]Environment="AWS_ACCESS_KEY_ID=AKIAUF222X2TAMFCVONW"Environment="AWS_SECRET_ACCESS_KEY=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"Environment="AWS_REGION=eu-west-2"EOF
# Restore SELinux contextsudo restorecon -RvF /etc/systemd/system/vault.service.d
# Reload systemd unit filessudo systemctl daemon-reload

3.3. Configurations on VAULT-1 node.#

3.3.1. Add the configurations under /etc/vault.d/vault.hcl

# vault-1.example.com configurationscat <<"EOF" | sudo tee /etc/vault.d/vault.hcl > /dev/nulldisable_cache           = truedisable_mlock           = trueui                      = true
listener "tcp" {   address              = "0.0.0.0:8200"   cluster_addr         = "0.0.0.0:8201"   tls_client_ca_file   = "/opt/vault/tls/ca.pem"   tls_cert_file        = "/opt/vault/tls/tls.crt"   tls_key_file         = "/opt/vault/tls/tls.key"   tls_disable          = false}
storage "raft" {
   node_id              = "vault-1"   path                 = "/opt/vault/data"
   retry_join {
      leader_api_addr   = "https://vault-1.example.com:8200"      }      retry_join {
      leader_api_addr   = "https://vault-2.example.com:8200"      }      retry_join {
      leader_api_addr   = "https://vault-3.example.com:8200"      }
}
seal "awskms" {
      kms_key_id        = "3t6265bd-31c0-456d-a4cb-a4d24dc28c1d"
}
cluster_addr            = "https://vault-1.example.com:8201"api_addr                = "https://vault.example.com:8200"max_lease_ttl           = "10h"default_lease_ttl       = "10h"cluster_name            = "vault"raw_storage_endpoint    = truedisable_sealwrap        = truedisable_printable_check = trueEOF

3.3.2. Set correct permissions.

# Set permissionssudo chown vault:vault /etc/vault.d/vault.hclsudo chmod 0644 /etc/vault.d/vault.hcl
# Restore SELinux contextsudo restorecon -RvF /etc/vault.d

3.3.3. Start and enable vault.service.

# Start and enable vault.servicesudo systemctl enable --now vault.service

3.3.4. If there are any errors, please check systemd logs.

# Check error messages in journaldsudo journalctl -f -b --no-pager -u vault

3.4. Configurations on VAULT-2 node.#

3.4.1. Add the configurations under /etc/vault.d/vault.hcl

# vault-2.example.com configurationscat <<"EOF" | sudo tee /etc/vault.d/vault.hcl > /dev/nulldisable_cache           = truedisable_mlock           = trueui                      = true
listener "tcp" {   address              = "0.0.0.0:8200"   cluster_addr         = "0.0.0.0:8201"   tls_client_ca_file   = "/opt/vault/tls/ca.pem"   tls_cert_file        = "/opt/vault/tls/tls.crt"   tls_key_file         = "/opt/vault/tls/tls.key"   tls_disable          = false}
storage "raft" {
   node_id              = "vault-2"   path                 = "/opt/vault/data"
   retry_join {
      leader_api_addr   = "https://vault-1.example.com:8200"      }      retry_join {
      leader_api_addr   = "https://vault-2.example.com:8200"      }      retry_join {
      leader_api_addr   = "https://vault-3.example.com:8200"      }
}
seal "awskms" {
      kms_key_id        = "3t6265bd-31c0-456d-a4cb-a4d24dc28c1d"
}
cluster_addr            = "https://vault-2.example.com:8201"api_addr                = "https://vault.example.com:8200"max_lease_ttl           = "10h"default_lease_ttl       = "10h"cluster_name            = "vault"raw_storage_endpoint    = truedisable_sealwrap        = truedisable_printable_check = trueEOF

3.4.2. Set correct permissions.

# Set permissionssudo chown vault:vault /etc/vault.d/vault.hclsudo chmod 0644 /etc/vault.d/vault.hcl
# Restore SELinux contextsudo restorecon -RvF /etc/vault.d

3.4.3. Start and enable vault.service.

# Start and enable vault.servicesudo systemctl enable --now vault.service

3.4.4. If there are any errors, please check systemd logs.

# Check error messages in journaldsudo journalctl -f -b --no-pager -u vault

3.5. Configurations on VAULT-3 node.#

3.5.1. Add the configurations under /etc/vault.d/vault.hcl

# vault-3.example.com configurationscat <<"EOF" | sudo tee /etc/vault.d/vault.hcl > /dev/nulldisable_cache           = truedisable_mlock           = trueui                      = true
listener "tcp" {   address              = "0.0.0.0:8200"   cluster_addr         = "0.0.0.0:8201"   tls_client_ca_file   = "/opt/vault/tls/ca.pem"   tls_cert_file        = "/opt/vault/tls/tls.crt"   tls_key_file         = "/opt/vault/tls/tls.key"   tls_disable          = false}
storage "raft" {
   node_id              = "vault-3"   path                 = "/opt/vault/data"
   retry_join {
      leader_api_addr   = "https://vault-1.example.com:8200"      }      retry_join {
      leader_api_addr   = "https://vault-2.example.com:8200"      }      retry_join {
      leader_api_addr   = "https://vault-3.example.com:8200"      }
}
seal "awskms" {
      kms_key_id        = "3t6265bd-31c0-456d-a4cb-a4d24dc28c1d"
}
cluster_addr            = "https://vault-3.example.com:8201"api_addr                = "https://vault.example.com:8200"max_lease_ttl           = "10h"default_lease_ttl       = "10h"cluster_name            = "vault"raw_storage_endpoint    = truedisable_sealwrap        = truedisable_printable_check = trueEOF

3.5.2. Set correct permissions.

# Set permissionssudo chown vault:vault /etc/vault.d/vault.hclsudo chmod 0644 /etc/vault.d/vault.hcl
# Restore SELinux contextsudo restorecon -RvF /etc/vault.d

3.5.3. Start and enable vault.service.

# Start and enable vault.servicesudo systemctl enable --now vault.service

3.5.4. If there are any errors, please check systemd logs.

# Check error messages in journaldsudo journalctl -f -b --no-pager -u vault

3.6. Initialize the Vault cluster on VAULT-1 node with KMS auto-unseal.#

3.6.1. Before initializing the cluster, make sure to check cluster status.

vault status

You should get an output like below.

Key                      Value---                      -----Recovery Seal Type       awskmsInitialized              falseSealed                   trueTotal Recovery Shares    0Threshold                0Unseal Progress          0/0Unseal Nonce             n/aVersion                  1.6.0Storage Type             raftHA Enabled               true

3.6.2. Initialize the vault cluster.

# Initialize the vault cluster with 5 key shares and a key threshold of 3vault operator init -recovery-shares=5 -recovery-threshold=3

If the command succeeded, you would get an output like below. Please make sure to record it.

# The vault cluster has been initialized with 5 key shares and a key threshold of 3Recovery Key 1: c/ZBKhTm8mpb9KVkN8X1DP5mAuuHLWyIu2dLc5UN7lkTRecovery Key 2: IAbLfsK7BKxlwsjDH1/VjkcSdKYa+e988rZ2U52V/bE/Recovery Key 3: V5CvZMdXRbQS5hUtNCAsADcUabYXVJ3SpR1HeDIaX9CxRecovery Key 4: JJlReHb4TSPb+kbVnZxmab7fJxPMjhvXHsUtIRajfWgYRecovery Key 5: DEZFEQBDUW9hVCSSfo0hbqy2Dv/Y/jfjgMEHej7eRp52
Initial Root Token: s.FLmDWb9rZs4KvdnJ9ZjfmLtu
Success! Vault is initialized
Recovery key initialized with 5 key shares and a key threshold of 3. Pleasesecurely distribute the key shares printed above.

3.6.3. Verify if the cluster is initialized and unsealed.

vault status

You should get an output like below.

Key                      Value---                      -----Recovery Seal Type       shamirInitialized              trueSealed                   falseTotal Recovery Shares    5Threshold                3Version                  1.6.0Storage Type             raftCluster Name             vaultCluster ID               37d8c72c-acfb-4b55-c2ac-9ac3247dd70fHA Enabled               trueHA Cluster               https://vault-1.example.com:8201HA Mode                  activeRaft Committed Index     52Raft Applied Index       52

3.7. Join the VAULT-2 and VAULT-3 nodes to the cluster.#

3.7.1. When you initialize the VAULT-1 node, the other nodes should automatically initialize and unseal using raft replication. Please verify if the other cluster nodes are initialized and unsealed.

# Run this on both vault-2.example.com and vault-3.example.com nodesvault status

You should get an output like below.

Key                      Value---                      -----Recovery Seal Type       shamirInitialized              trueSealed                   falseTotal Recovery Shares    5Threshold                3Version                  1.6.0Storage Type             raftCluster Name             vaultCluster ID               37d8c72c-acfb-4b55-c2ac-9ac3247dd70fHA Enabled               trueHA Cluster               https://vault-1.example.com:8201HA Mode                  activeRaft Committed Index     52Raft Applied Index       52

3.7.2. If the VAULT-2 and VAULT-3 cluster nodes are still uninitialized and sealed, please restart the vault service.

# Run this on both vault-2 and vault-3 nodessudo systemctl restart vault.service

3.7.3. Run the following command on any node and verify if the cluster is active.

# Export the vault token generated in step 3.6.2export VAULT_TOKEN="s.FLmDWb9rZs4KvdnJ9ZjfmLtu"
# List raft peer nodesvault operator raft list-peers

You should get an output like below.

Node       Address                     State       Voter----       -------                     -----       -----vault-1    vault-1.example.com:8201    leader      truevault-2    vault-2.example.com:8201    follower    truevault-3    vault-3.example.com:8201    follower    true

3.7.4. If there are any errors, please check systemd logs.

# Check error messages in journaldsudo journalctl -f -b --no-pager -u vault

4. Configure Cluster High Availability using Keepalived and Floating IPs#

important
  • If you are using a cloud load balancer, you can SKIP this step. You have to use load balancer health checks instead.
  • HashiCorp Vault health check URL https://localhost:8200/v1/sys/health

4.1. Install and configure Keepalived on ALL nodes.#

4.1.1. Install Keepalived package.

# Install Keepalived packagesudo yum install -y keepalived

4.1.2. Allow VRRP traffic on the firewall.

# Allow VRRP traffic on the firewallsudo firewall-cmd --permanent --add-rich-rule='rule protocol value="vrrp" accept'sudo firewall-cmd --reload

4.1.3. Create the health check script under /usr/libexec/keepalived

# Create the health check scriptcat <<"EOF" | sudo tee /usr/libexec/keepalived/vault-health > /dev/null#!/bin/bashif [ $# -ne 1 ];then echo "WARNING: You must provide health check URL" exit 1else CHECK_URL=$1 CMD=$(/usr/bin/curl -k -I ${CHECK_URL} 2>/dev/null | grep "200 OK" | wc -l) if [ ${CMD} -eq 1 ];then   exit 0 else   exit 1 fifiEOF

4.1.4. Provide necessary permissions to the health check script.

# Make the health check script executable sudo chmod +x /usr/libexec/keepalived/vault-health
# Restore SELinux contextsudo restorecon -RvF /usr/libexec/keepalived

4.1.5. Configure keepalived.

cat <<"EOF" | sudo tee /etc/keepalived/keepalived.conf > /dev/null# Global definitions configuration blockglobal_defs {
    router_id LVS_LB
}
vrrp_script check_vault_health {
    script "/usr/libexec/keepalived/vault-health https://localhost:8200/v1/sys/health"    interval 3
}
vrrp_instance VI_1 {
    state MASTER    virtual_router_id 100    priority 100    advert_int 1
    # Please make sure to verify interface name using 'nmcli connection show'    interface ens192
    virtual_ipaddress {
        # Floating IP        192.168.15.100
    }
    track_interface {
        # Please make sure to verify interface name using 'nmcli connection show'        ens192
    }
    track_script {
        check_vault_health
    }
}EOF

4.1.6. Start and enable keepalived.service.

# Start and enable vault.servicesudo systemctl enable --now keepalived.service
# Verify if the service is startedsudo systemctl status keepalived.service

4.1.7. Check if the floating IP is assigned to a node.

# Run this command on ALL nodes. One node should show it is assignedsudo ip addr | grep 192.168.15.100

5. Maintenance#

5.1. Backup and restore a vault cluster using raft snapshots.#

5.1.1. Create a snapshot of raft storage.

# Export the vault token generated in step 3.6.2export VAULT_TOKEN="s.FLmDWb9rZs4KvdnJ9ZjfmLtu"
# Create a snapshot of raft storagevault operator raft snapshot save raft-$(date +"%Y-%m-%d").snap

5.1.2. Restore a snapshot of raft storage.

# Export the vault token generated in step 3.6.2export VAULT_TOKEN="s.FLmDWb9rZs4KvdnJ9ZjfmLtu"
# Restore a snapshot of raft storagevault operator raft snapshot restore -force raft-$(date +"%Y-%m-%d").snap

5.2. Migrate a vault cluster using raft snapshots.#

important
  • If you want to clone/migrate a vault cluster to a new cluster with a new AWS KMS key, please follow these steps.
  • Let's assume the NEW KMS_KEY_ID and KMS ARN values are "5m6268td-43c0-45d4-a4nc-a6d34mc38t2c" and "arn:aws:kms:eu-west-2:285216434864:key/5m6268td-43c0-45d4-a4nc-a6d34mc38t2c" respectively.

5.2.1. Please run these commands on any node of OLD vault cluster.

5.2.1.1. Create a snapshot of raft storage of the old vault cluster.

# Export the vault token generated in step 3.6.2export VAULT_TOKEN="s.FLmDWb9rZs4KvdnJ9ZjfmLtu"
# Create a snapshot of raft storagevault operator raft snapshot save raft-$(date +"%Y-%m-%d").snap

5.2.2. Please run these commands on NEW vault cluster.

5.2.2.1. To restore the cluster, you must have access to both OLD and NEW AWS KMS keys. Please create/update the IAM policy and attach it to the corresponding IAM User/Role.

# Example:# Create an IAM user/role with the following policy{    "Version": "2012-10-17",    "Statement": {        "Effect": "Allow",        "Action": [            "kms:DescribeKey",            "kms:Encrypt",            "kms:Decrypt"        ],        "Resource": [            "arn:aws:kms:eu-west-2:285216434864:key/3t6265bd-31c0-456d-a4cb-a4d24dc28c1d",            "arn:aws:kms:eu-west-2:285216434864:key/5m6268td-43c0-45d4-a4nc-a6d34mc38t2c"        ]    }}

5.2.2.2. Please make sure to use the NEW KMS key in the /etc/vault.d/vault.hcl.

seal "awskms" {
      kms_key_id = "5m6268td-43c0-45d4-a4nc-a6d34mc38t2c"
}

5.2.2.3. Initialize the NEW vault cluster.

# Initialize the vault cluster with 5 key shares and a key threshold of 3vault operator init -recovery-shares=1 -recovery-threshold=1

If the command succeeded, you would get an output like below. Please make sure to record it.

# The vault cluster has been initialized with 5 key shares and a key threshold of 3Recovery Key 1: m/YhGRTm8mpb9KVkND5wQP5mAuuHLWyTrEwLc5UN7cbR
Initial Root Token: s.kCrcKb9rZs4bc5dnJ9ZjfmbNT
Success! Vault is initialized
Recovery key initialized with 1 key shares and a key threshold of 1. Pleasesecurely distribute the key shares printed above.

5.2.2.4. Verify if the cluster is initialized and unsealed.

vault status

5.2.2.5. Now you are ready to import vault backup.

# Export the NEW cluster vault token generated in step 4.2.2.3export VAULT_TOKEN="s.kCrcKb9rZs4bc5dnJ9ZjfmbNT"
# Restore the snapshot created in step 4.2.1.1vault operator raft snapshot restore -force raft-$(date +"%Y-%m-%d").snap

5.2.2.6. Verify if the cluster is initialized and unsealed.

vault status

5.2.2.7. Remove the OLD KMS ARN entry from IAM policy.

# Updated IAM user/role policy{    "Version": "2012-10-17",    "Statement": {        "Effect": "Allow",        "Action": [            "kms:DescribeKey",            "kms:Encrypt",            "kms:Decrypt"        ],        "Resource": [            "arn:aws:kms:eu-west-2:285216434864:key/5m6268td-43c0-45d4-a4nc-a6d34mc38t2c"        ]    }}

6. References#

  1. Install and Configure Hashicorp Vault Server on Ubuntu / CentOS / Debian
  2. How To Install Hashicorp Vault On CentOS 7
  3. Keepalived: Configuring High Availability with IP Failover on CentOS/RHEL
  4. Integrated Storage (Raft) Backend
  5. Raft operator
  6. AWS KMS Seal
Last updated on by Yasitha Bogamuwa