keepalived destination not reachable

surya · 18. August 2021

I am setting up a three node cluster with failover IP, cloud vlan and keepalived on ubuntu 20.04. I tried various options for configuring keepalived. The virtual IP address gets assigned but is not reachable from another node within the cluster.

/etc/netplan/keepalived.yaml

Code

network:
   version: 2
   bridges:
       ens01:
           dhcp4: no
           dhcp6: no
           accept-ra: no

/etc/keepalived/keepalived.conf

Code

vrrp_script chk_haproxy {
        script "killall -0 haproxy"
        interval 2
        rise 2
}

vrrp_instance VI_1 {
        interface ens192
        state BACKUP
        priority 100
        advert_int 1

        virtual_router_id 50
        unicast_src_ip 192.168.60.48
        unicast_peer {
                192.168.60.49
        }

        virtual_ipaddress {
                192.168.60.50/24 dev ens01
        }

        track_script {
                chk_haproxy
        }
}

Alles anzeigen

I tried creating the network interface using systemd-networkd:

Code

[NetDev]
Name=ens01
Kind=dummy

I tried with use_vmac, strict_mode off and few other settings but I am unable to ping the virtual IP address from another machine. There are no firewalls and the nodes are able to talk to each other using the cloud vlan network configuration. Do I need to do any additional routing settings?

michaeleifel · 18. August 2021

Hello,

your config is looking like it came from this blogpost: https://chr4.org/posts/2019-01…an-slash-systemd-network/

Can you tell me what you are trying to reach with your setup? Why are you trying to use a dummy interface if you have a real one available? For me it's not quite clear if you are trying to make use of an internal loadbalancer approach.

I'm using keepalived by myself with the cloud vlan and the failover ip for external access. In my case i use the cloud vlan for the communication of keepalived nodes. This is a shortened version of my config. The Nodes are having 172.16.0.11 - 13 on the eth1 Interface.

Code

#
# Ansible managed
#

global_defs {
}

...
vrrp_script chk_kubernetes_port {
  script "/usr/bin/nc -w 2 -zv localhost 6443"
  weight 3
  timeout 3
  user nagios
}
...

vrrp_instance Netcup {
  interface eth1
  state BACKUP
  priority 101
  virtual_router_id 51
  advert_int 5

  smtp_alert

  authentication {
    auth_type PASS
    auth_pass .....
  }

  virtual_ipaddress {
    XX.XX.XX.XX/32 dev eth0
  }

  virtual_ipaddress_excluded {
    XXXX:XXXX:XXXX:XXXX::1/64 dev eth0
  }

  preempt_delay 300


  track_script {
    chk_kubernetes_port
  }


  unicast_src_ip 172.16.0.11


  unicast_peer {
    172.16.0.12
    172.16.0.13
  }

  notify "/etc/keepalived/notifications.sh" 
  notify_master "/etc/netcup/keepalived_master_ipv4.sh && /etc/netcup/keepalived_master_ipv6.sh" 
}

Alles anzeigen

surya · 19. August 2021

Hello michaeleifel , Thanks very much for your reply.

I am trying to do something very similar to your setup. I am trying to do a failover for the kubernetes control plane. I was initially trying with the cloud vlan interface and when it didn't work I tried with the dummy interface as per the link you mentioned.

I fixed the issue after looking at your configuration. I switched the failover IP to use eth0 and added unicast_src_ip and unicast_peer for the secondary interfaces and it worked!

I was testing the failover IP routing using web service and it works as expected when I shutdown the master node. However, when the master node comes back, the virtual IP switches to the master node but the failover IP doesn't get rerouted because I used the notify_fault script.

Code

notify_fault "/etc/keepalived/failover.sh"

I am looking into your script from here: https://forum.netcup.de/admini…ions-erkennen/#post154940

michaeleifel · 19. August 2021

Hello surya

glad to hear that you found your issue.

I found that calling the shell script via keepalived isn't 100% reliable in error cases. Keepalived still manages who is "master" and who is "backup", but the IP switch itself is done through a shell script. This works as following:

- Check if file '/tmp/keepalived.status' exists

Backup Nodes:

- Check if the string "BACKUP" is inside '/tmp/keepalived.status' and Keepalived is running, print info that node is not master

Master Nodes:

- Check if the string "MASTER" is inside '/tmp/keepalived.status' and Keepalived is running

- Check if DNS works, WSDL is reachable and if Node has already the floating IP

- If so, print information that node has IP

- If not, trigger failover

Bash

#!/bin/bash
# set -eux
set -eu

# Colors
Black='\033[0;30m'        # Black
Red='\033[0;31m'          # Red
RedBlink='\033[31;5m'     # Red Blink
Green='\033[0;32m'        # Green
GreenBlink='\033[32;5m'   # Green Blink
Brown='\033[0;33m'        # Brown
BrownBlink='\033[33;5m'   # Brown Blink
Blue='\033[0;34m'         # Blue
BlueBlink='\033[34;5m'    # Blue Blink
Purple='\033[0;35m'       # Purple
PurpleBlink='\033[35;5m'  # Purple Blink
NC='\033[0m' # No Color
Bold='\033[1m'
Blink='\033[5m'



# Netcup URLs
# https://www.netcup-wiki.de/wiki/Netcup_SCP_Webservice
DOMAIN="www.servercontrolpanel.de"
WSDL="https://www.servercontrolpanel.de/WSEndUser?wsdl"
API="https://www.servercontrolpanel.de/WSEndUser"
BACKUP="BACKUP"
MASTER="MASTER"
FILE="/tmp/keepalived.status"
KEEPALIVE_PROCS=$(ps uax|grep '/usr/local/sbin/keepalived'|grep -v grep|wc -l|tr -d "\n")

# Stupid check, i know
check-dns() {
  dig +short ${DOMAIN}
}

check-wsdl() {
  curl -s -o /dev/null -w '%{http_code}' -m 10 "${WSDL}"
}

# Execute SOAP Action getVServers to check reachability
check-api() {
  curl -s -o /dev/null -w '%{http_code}' -m 10 -H 'Content-Type: text/xml; charset=utf-8' -H 'SOAPAction:' -d @/etc/netcup/getVServers.xml -X POST ${API}
}

# Execute SOAP Action getVServerIPs to check current IPs
check-ips() {
  curl -s -w '%{http_code}' -m 10 -H 'Content-Type: text/xml; charset=utf-8' -H 'SOAPAction:' -d @/etc/netcup/getVServerIPs.xml -X POST ${API} | grep -o -i XX.XX.XX.XX | wc -l
}

# Failover Process
# StatusCode 200 indicates operation has been triggered
# StatusCode 500 indicates operation isn't available, cause IP might be already switched

trigger-attach() {
  while true
  do
    STATUS=$(curl -s -o /dev/null -w '%{http_code}' -H 'Content-Type: text/xml; charset=utf-8' -H 'SOAPAction:' -d @/etc/netcup/attachIPRouting.xml -X POST ${API})
    if [ "$STATUS" -eq 200 ]; then
      echo -e "Failover: ✓ Attach is triggered!"
      break
    elif [ "$STATUS" -eq 500 ]; then
      echo -e "Attachment: ☢ Got $STATUS , seems wrong"
      echo -e "Attachment: ☹ Break back into main loop"
      break
    else
      echo -e "Attachment: ✗ Got $STATUS :( Not done yet..."
    fi
    sleep 10
  done
}

trigger-detach() {
  while true
  do
    STATUS=$(curl -s -o /dev/null -w '%{http_code}' -H 'Content-Type: text/xml; charset=utf-8' -H 'SOAPAction:' -d @/etc/netcup/detachIPRouting.xml -X POST ${API})
    if [ "$STATUS" -eq 200 ]; then
      echo -e "Detachment: ✓ Detach is triggered!"
      break
    elif [ "$STATUS" -eq 500 ]; then
      echo -e "Detachment: ☢ Got $STATUS , seems wrong"
      echo -e "Detachment: ☹ Break back into main loop"
      break
    else
      echo -e "Detachment: ✗ Got $STATUS :( Not done yet..."
    fi
    sleep 10
  done
}

### Real Stuff is down here

wait-for-dns() {
  while true
  do
    IP=$(check-dns)
    if [ -n "$IP" ]; then
      echo -e "DNS Check: ✓ SCP API resolves to $IP!"
      break
    else
      echo -e "DNS Check: ✗ Could not resolve $DOMAIN"
    fi
    sleep 10
  done
}

wait-for-wsdl() {
  while true
  do
    STATUS=$(check-wsdl)
    if [ "$STATUS" -eq 200 ]; then
      echo -e "WSDL Check: ✓ Netcup URL is reachable!"
      break
    else
      echo -e "WSDL Check: ✗ Got $STATUS :( Not done yet..."
    fi
    sleep 10
  done
}

wait-for-api() {
  while true
  do
    STATUS=$(check-api)
    if [ "$STATUS" -eq 200 ]; then
      echo -e "API Check: ✓ Netcup API is reachable!"
      break
    else
      echo -e "API Check: ✗ Got $STATUS :( Not done yet..."
    fi
    sleep 10
  done
}

main() {
  while true
  do
    if [ ! -f "$FILE" ]; then
    echo "$FILE does not exist."
    fi
    if grep -q "$MASTER" "$FILE" && [[ $KEEPALIVE_PROCS -gt 0 ]]; then
      wait-for-dns
      wait-for-wsdl
      RESULT_STRING=$(check-ips)
      if [ "$RESULT_STRING" -eq 1 ]; then
        echo -e "DeadManSwitch: ☑ Node has already XX.XX.XX.XX"
      else
        echo -e "Failover: ☐ Change FailoverIP Routing to XX.XX.XX.XX${NC}"
        wait-for-api
        trigger-attach
      fi
    elif grep -q "$BACKUP" "$FILE"; then
      echo -e "DeadManSwitch: ☑ Node is not in master state"
    else
      echo -e "Panic: errors occured!"
    fi
    sleep 60
  done
}

main

Alles anzeigen

surya · 19. August 2021

Hello michaeleifel

Thanks a lot for the explanation and the script. That is brilliant.

I did some testing with a simple notify script to invoke the routing web service and am able to simulate the loop condition. Will try to implement your script.

surya · 19. August 2021

Hello michaeleifel

I solved (I think!) the problem slightly differently and it seems to do the job. keepalived will always only ever nominate one of the servers as the MASTER. I check if the node is the MASTER and invoke the web service for rerouting the floating IP to the MASTER.

Bash

#!/bin/bash
CURL_BODY='...'

ENDSTATE=$3

case $ENDSTATE in
    "MASTER") CURL_OUTPUT=$(curl -s -H "Content-Type: text/xml; charset=utf-8" -H "SOAPAction:" -d "$CURL_BODY" -X POST "https://www.servercontrolpanel.de:443/SCP/WSEndUser")
        echo "$CURL_OUTPUT"
        exit 0
        ;;
esac

Alles anzeigen

keepalived destination not reachable

Ähnliche Themen

Failover-IPv4: Deadlock / RaceConditions erkennen?