if you read my previous pieces about my home network, you know well my core switch is Nexus 93180YC-EX. you know… home, core switch.
anycasted services
at any point in time I have a number of DNS (and DHCP) servers available, all reachable via either 192.168.168.168 or 2001:470:xx:a6::168. no matter what is going on, at least one should be able to respond.
currently, in the “cluster” I have two VMs and two physical Raspberry Pi 4B+. all of them run on FreeBSD 14.0-STABLE, with nsd
, unbound
and bird
packages, last one to do the advertisement of IPv4 and IPv6 addresses.
from the BGP perspective of core Nexus, those advertisements look like that - for IPv4:
sw-core# sh bgp ipv4 unicast
BGP routing table information for VRF default, address family IPv4 Unicast
BGP table version is 36, Local Router ID is 192.168.33.1
Status: s-suppressed, x-deleted, S-stale, d-dampened, h-history, *-valid, >-best
Path type: i-internal, e-external, c-confed, l-local, a-aggregate, r-redist, I-injected
Origin codes: i - IGP, e - EGP, ? - incomplete, | - multipath, & - backup, 2 - best2
Network Next Hop Metric LocPrf Weight Path
*|i192.168.168.168/32 192.168.66.22 100 0 i
*|i 192.168.66.44 100 0 i
*>i 192.168.44.180 100 0 i
*|i 192.168.66.33 100 0 i
for IPv6:
sw-core# sh bgp ipv6 unicast
[...]
Network Next Hop Metric LocPrf Weight Path
*|i2001:470:xx:a6::168/128
2001:470:xx:66::22
100 0 i
*|i 2001:470:xx:66::44
100 0 i
*>i 2001:470:xx:444::180
100 0 i
*|i 2001:470:xx:66::33
100 0 i
and finally - what ends up in actual routing table?
sw-core# sh ip route 192.168.168.168
IP Route Table for VRF "default"
'*' denotes best ucast next-hop
'**' denotes best mcast next-hop
'[x/y]' denotes [preference/metric]
'%<string>' in via output denotes VRF <string>
192.168.168.168/32, ubest/mbest: 4/0
*via 192.168.44.180, [200/0], 1w5d, bgp-65055, internal, tag 65055
*via 192.168.66.22, [200/0], 4d09h, bgp-65055, internal, tag 65055
*via 192.168.66.33, [200/0], 3w3d, bgp-65055, internal, tag 65055
*via 192.168.66.44, [200/0], 1w1d, bgp-65055, internal, tag 65055
sw-core# sh ipv6 route 2001:470:xx:a6::168
IPv6 Routing Table for VRF "default"
'*' denotes best ucast next-hop
'**' denotes best mcast next-hop
'[x/y]' denotes [preference/metric]
2001:470:xx:a6::168/128, ubest/mbest: 4/0
*via 2001:470:xx:66::22/128, [200/0], 4d09h, bgp-65055, internal, tag 65055
*via 2001:470:xx:66::33/128, [200/0], 3w3d, bgp-65055, internal, tag 65055
*via 2001:470:xx:66::44/128, [200/0], 1w1d, bgp-65055, internal, tag 65055
*via 2001:470:xx:444::180/128, [200/0], 1w5d, bgp-65055, internal, tag 65055
the (default) end result is that traffic gets distributed unequally across available servers. that’s because default load sharing algorithm uses only source IP address and source TCP/UDP port for hash. which for couple of my 192.168/16 networks results in traffic being distributed in roughly a ratio of 1/2 to 1/4 and to two 1/8.
you have to actually customize ECMP:
sw-core(config)# ip load-sharing address source-destination port source-destination rotate 32 universal-id 9239194
…and after about one day load distribution ended up almost ideal:
of course, you can verify ECMP operation on Nexus directly:
sw-core# sh ip load-sharing
IPv4/IPv6 ECMP load sharing:
Universal-id (Random Seed): 94812191
Load-share mode : address source-destination port source-destination
Rotate: 32
summary
outside of ECMP (Equal-Cost MultiPathing) you can try to use more complicated setups, for example with UCMP (Uneqal-Cost MultiPathing). however, at least for starters, this approach should work well ;)