Setup Highly Available ETCD Cluster on CentOS 7
The etcd is a strongly consistent, distributed key-value store that provides a reliable way to store data that needs to be accessed by a distributed system or cluster of machines. It gracefully handles leader elections during network partitions and can tolerate machine failure, even in the leader node.
#
1. Architecture Diagram#
2. System Requirements#
2.1. ETCD NodesComponent | Description |
---|---|
Number of VMs | 3 |
CPU | 2 Cores |
Memory | 4 GB |
Disk Size | 20 GB SSD |
Operating System | CentOS 7 x64 |
File System | XFS |
Privileges | ROOT access prefered |
#
2.2. IP AllocationComponent | Description |
---|---|
VM IPs | 10.101.15.101 - 10.101.15.103 |
#
2.3. DNS EntriesIP | Hostname | FQDN |
---|---|---|
10.101.15.101 | etcd-1 | etcd-1.cluster.local |
10.101.15.102 | etcd-2 | etcd-2.cluster.local |
10.101.15.103 | etcd-3 | etcd-3.cluster.local |
#
3. Install and Configure ETCD HA Cluster#
3.1. Install prerequisites on ALL nodes3.1.1. Set server hostname.
3.1.2. Install prerequisites.
3.1.3. Synchronize server time with default NTP servers. If you have your own NTP servers, please make sure to update the /etc/chrony.conf
3.1.4. Start and enable chronyd service.
3.1.5. Display time synchronization status.
3.1.6. Disable File Access Time Logging and enable Combat Fragmentation to enhance XFS file system performance. Add noatime,nodiratime,allocsize=64m
to all XFS volumes under /etc/fstab.
3.1.7. Tweaking the system for high concurrency and security.
3.1.8. Reload all sysctl variables without rebooting the server.
3.1.9. Create Local DNS records.
3.1.10. The servers need to be restarted before continuing further.
#
3.2. ETCD common configurations on ALL nodes3.2.1. Download and install the latest ETCD binary from GitHub.
important
You can download the latest ETCD binary from here.
3.2.2. Create ETCD service user.
3.2.3. Create ETCD directory structure and provide necessary permissions.
3.2.4. Copy the correct SSL certificate, SSL key and the CA certificate files under /etc/etcd/tls
important
- To generate a custom CA and CA signed SSL certificates, please follow this guide.
- Other than the *.cluster.local domain, you must have 'localhost' as a Subject Alternative Name (SAN) since etcdctl uses it for internal API calls.
3.2.5. Open necessary firewall ports.
3.2.6. Create ETCD Systemd service.
3.2.7. Restore SELinux context.
3.2.8. Reload systemd manager configurations.
#
3.3. Configurations on ETCD-1 node.3.3.1. Add the configurations under /etc/etcd/etcd.conf
3.3.2. Set correct permissions.
3.3.3. Start and enable etcd.service.
3.3.4. If there are any errors, please check systemd logs.
#
3.4. Configurations on ETCD-2 node.3.4.1. Add the configurations under /etc/etcd/etcd.conf
3.4.2. Set correct permissions.
3.4.3. Start and enable etcd.service.
3.4.4. If there are any errors, please check systemd logs.
#
3.5. Configurations on ETCD-3 node.3.5.1. Add the configurations under /etc/etcd/etcd.conf
3.5.2. Set correct permissions.
3.5.3. Start and enable etcd.service.
3.5.4. If there are any errors, please check systemd logs.
#
4. Maintenance#
4.1. Verify ETCD Cluster Health4.1.1. Checks the healthiness of endpoints.
4.1.2. Prints out the status of endpoints.
#
4.2. Backup and Restore an ETCD Cluster4.2.1. Create an ETCD snapshot.
important
Recovering a cluster first needs a snapshot of the keyspace from an etcd member.
A snapshot may either be taken from a live member with the etcdctl snapshot save command or by copying the member/snap/db file from an etcd data directory.
For example, the following command snapshots the keyspace to the file snapshot.db
4.2.2. Restore an ETCD Cluster.
important
To restore a cluster, all that is needed is a single snapshot db file. A cluster restore with etcdctl snapshot restore creates new etcd data directories; all members should restore using the same snapshot. Restoring overwrites some snapshot metadata (specifically, the member ID and cluster ID); the member loses its former identity. This metadata overwrite prevents the new member from inadvertently joining an existing cluster. Therefore in order to start a cluster from a snapshot, the restore must start a new logical cluster.
Snapshot integrity may be optionally verified at restore time. If the snapshot is taken with etcdctl snapshot save, it will have an integrity hash that is checked by etcdctl snapshot restore. If the snapshot is copied from the data directory, there is no integrity hash and it will only restore by using the configuration flag
--skip-hash-check=true
.A restore initializes a new member of a new cluster, with a fresh cluster configuration using etcd's cluster configuration flags, but preserves the contents of the etcd keyspace. Continuing from the previous example, the following creates new etcd data directories under /var/lib/etcd for a three member cluster.
Before you execute the following commands, please make sure to BACKUP /var/lib/etcd directory.
4.2.2.1. Stop ETCD service and remove /var/lib/etcd directory on ALL nodes.
4.2.2.2. Restore ETCD-1 node.
4.2.2.3. Restore ETCD-2 node.
4.2.2.4. Restore ETCD-3 node.
4.2.2.5. Set necessary permissions and SELinux context on ALL nodes.
4.2.2.6. Start ETCD service on ALL nodes.
4.2.2.7. If there are any errors, please check systemd logs.