Benutzer-Werkzeuge

Webseiten-Werkzeuge


public:proxmox:dual-node-ha

Proxmox VE HA Cluster with 2 Nodes + QDevice

Goal

Build a highly available Proxmox VE cluster consisting of:

* 2 hypervisor nodes * 1 external QDevice (quorum server) * shared storage (e.g. NFS)

The goal is a stable cluster with quorum and failover capability.

Architecture

* Node A (PVE) * Node B (PVE) * QDevice (separate system, e.g. Raspberry Pi) * Shared Storage (NFS or equivalent)

Requirements

* Time synchronization (NTP) * Consistent name resolution (DNS or /etc/hosts) * Static IP addresses (no changing SLAAC addresses for cluster communication) * Reliable network connectivity between all systems * SSH access between nodes and QDevice (key-based)

Implementation (High-Level)

1. Prepare Storage

* Mount shared storage on both nodes * Configure storage in Proxmox * Ensure access from both nodes

2. Initialize Cluster

* Create cluster on Node A * Join Node B to the cluster

3. Prepare QDevice

* Set up the QDevice system * Install and start corosync-qnetd

4. Integrate QDevice

* Add QDevice from the cluster * Initialize certificate-based communication

5. Restore / Migrate VMs

* Move or restore VMs to shared storage * Verify functionality

6. Enable HA

* Define HA groups * Assign VMs to groups * Configure failover behavior

Pitfalls

Cluster join fails

* Node already contains VMs or old cluster state * Solution: clean node before joining

QDevice setup fails

* Missing packages on nodes (e.g. corosync-qdevice) * Incomplete initial setup

Certificate errors (TLS)

* QDevice logs:

SSL peer cannot verify your certificate

* Cause:

  • inconsistent certificates
  • repeated/failed setup attempts

* Solution:

  • remove QDevice
  • reset certificate database
  • perform clean setup

Permission errors on QDevice

* NSS DB not accessible

* Cause:

  • wrong ownership (root instead of service user)

* Solution:

  • assign correct ownership to service user

SSH issues during QDevice setup

* Root login via password disabled

* Key-based authentication not working

* Solution:

  • deploy SSH keys correctly
  • verify access

Network issues (cluster communication)

* Asymmetric routing

* Wrong interface selection

* Solution:

  • use dedicated network for cluster
  • ensure consistent routing paths

DNS / hostname issues

* Inconsistent forward/reverse resolution

* Solution:

  • ensure consistent name resolution
  • avoid dynamic addresses

QDevice reachable but unstable

* Logs show repeated disconnects

* Cause:

  • certificate or network issues

Best Practices

* Use static IP addresses for cluster communication * Use dedicated network for Corosync and migration * Validate shared storage before cluster setup * Run QDevice on independent infrastructure * Perform certificate setup cleanly and only once

Result

* Cluster with quorum (2 nodes + QDevice) * HA-capable environment * VMs can restart on surviving node after failure

Note

A 2-node cluster without QDevice is not quorum-capable. QDevice is mandatory for stable operation.

public/proxmox/dual-node-ha.txt · Zuletzt geändert: von gerson

Seiten-Werkzeuge