The Sysadminosaurus' IT blog: High Availability

Showing posts with label High Availability. Show all posts

Tuesday, September 9, 2014

Zen Load Balancer 3.0.3 Perfomance and Security Customization Part 5

Now, let's move on to NIC bonding. This is useful if one of our NICs goes dead; we obviously want to make sure if that happens, we have another standing by that will take over.

Many admins have a dedicated VLAN for cluster synchronization purposes. Some others just connect two nodes using a crossover cable. That means that if one NIC goes down, all hell breaks loose; if it is the cluster synchronization NIC, then both nodes think that the other node has gone down and they both try to become masters causing havoc to the network; in any other case your frontends and backends seem to be down due to your NIC being dead.

So in that case, we employ NIC bonding. There are actually a few types of network bonding (from here):

balance-rr or 0: Round-robin policy: Transmit packets in sequential order from the first available slave through the last. This mode provides load balancing and fault tolerance.
active-backup or 1: Active-backup policy: Only one slave in the bond is active. A different slave becomes active if, and only if, the active slave fails. The bond's MAC address is externally visible on only one port (network adapter) to avoid confusing the switch.
In bonding version 2.6.2 or later, when a failover occurs in active-backup mode, bonding will issue one or more gratuitous ARPs on the newly active slave. One gratutious ARP is issued for the bonding master interface and each VLAN interfaces configured above it, provided that the interface has at least one IP address configured. Gratuitous ARPs issued for VLAN interfaces are tagged with the appropriate VLAN id. This mode provides fault tolerance.
balance-xor or 2: XOR policy: Transmit based on the selected transmit hash policy. The default policy is a simple [(source MAC address XOR'd with destination MAC address) modulo slave count]. Alternate transmit policies may be selected via the xmit_hash_policy option. This mode provides load balancing and fault tolerance.
broadcast or 3: Broadcast policy: transmits everything on all slave interfaces. This mode provides fault tolerance.
802.3ad or 4: IEEE 802.3ad Dynamic link aggregation. Creates aggregation groups that share the same speed and duplex settings. Utilizes all slaves in the active aggregator according to the 802.3ad specification. Slave selection for outgoing traffic is done according to the transmit hash policy, which may be changed from the default simple XOR policy via the xmit_hash_policy option. Note that not all transmit policies may be 802.3ad compliant, particularly in regards to the packet mis-ordering requirements of section 43.2.4 of the 802.3ad standard. Differing peer implementations will have varying tolerances for noncompliance. Prerequisites:

Ethtool support in the base drivers for retrieving the speed and duplex of each slave.
A switch that supports IEEE 802.3ad Dynamic link aggregation.
Most switches will require some type of configuration to enable 802.3ad mode.

balance-tlb or 5: Adaptive transmit load balancing: channel bonding that does not require any special switch support. The outgoing traffic is distributed according to the current load (computed relative to the speed) on each slave. Incoming traffic is received by the current slave. If the receiving slave fails, another slave takes over the MAC address of the failed receiving slave. Prerequisites:

Ethtool support in the base drivers for retrieving the speed and duplex of each slave.

balance-alb or 6: Adaptive load balancing: includes balance-tlb plus receive load balancing (rlb) for IPV4 traffic, and does not require any special switch support. The receive load balancing is achieved by ARP negotiation. The bonding driver intercepts the ARP Replies sent by the local system on their way out and overwrites the source hardware address with the unique hardware address of one of the slaves in the bond such that different peers use different hardware addresses for the server. Receive traffic from connections created by the server is also balanced. When the local system sends an ARP Request the bonding driver copies and saves the peer's IP information from the ARP packet. When the ARP Reply arrives from the peer, its hardware address is retrieved and the bonding driver initiates an ARP reply to this peer assigning it to one of the slaves in the bond. A problematic outcome of using ARP negotiation for balancing is that each time that an ARP request is broadcast it uses the hardware address of the bond. Hence, peers learn the hardware address of the bond and the balancing of receive traffic collapses to the current slave. This is handled by sending updates (ARP Replies) to all the peers with their individually assigned hardware address such that the traffic is redistributed. Receive traffic is also redistributed when a new slave is added to the bond and when an inactive slave is re-activated. The receive load is distributed sequentially (round robin) among the group of highest speed slaves in the bond. When a link is reconnected or a new slave joins the bond the receive traffic is redistributed among all active slaves in the bond by initiating ARP Replies with the selected mac address to each of the clients. The updelay parameter (detailed below) must be set to a value equal or greater than the switch's forwarding delay so that the ARP Replies sent to the peers will not be blocked by the switch. Prerequisites:

Ethtool support in the base drivers for retrieving the speed and duplex of each slave.
Base driver support for setting the hardware address of a device while it is open. This is required so that there will always be one slave in the team using the bond hardware address (the curr_active_slave) while having a unique hardware address for each slave in the bond. If the curr_active_slave fails its hardware address is swapped with the new curr_active_slave that was chosen.

In this example we will employ the active-backup method. This is the safest method to use. Most googlers like link aggregation, since an aggregation group will increase the overall bandwidth of the resulting interface.

Let's suppose we want to bond eth8 and eth3 to an interface with the IP 172.16.0.8/22, eth9 and eth4 to an interface with the IP 172.16.4.8/22, and eth0 and eth9 to an interface with the IP 172.16.8.8/23:

root@zen-lb:~# apt-get install ifenslave-2.6
root@zen-lb:~# vi /etc/network/interfacesauto lo
iface lo inet loopback
auto bond0
iface bond0 inet static
    address 172.16.0.8
    netmask 255.255.252.0
    network 172.16.0.0
    gateway 172.16.0.1
    slaves eth8 eth3
    bond-mode active-backup
    bond-miimon 100
    bond-primary eth8
auto bond1
iface bond1 inet static
    address 172.16.4.8
    netmask 255.255.252.0
    network 172.16.4.0
    slaves eth9 eth4
    bond-mode active-backup
    bond-miimon 100
    bond-primary eth9
auto bond2
iface bond2 inet static
    address 172.16.8.8
    netmask 255.255.254.0
    network 172.16.8.0
    slaves eth0 eth5
    bond-mode active-backup
    bond-miimon 100
    bond-primary eth0

bond-primary is the NIC that will be our primary device.
bond-miimon is how often the link state will be polled.
So, in our case, every 100ms eth8 and eth3 will be polled; if eth8 is up, then this will serve our incoming and outgoing requests, otherwise eth3 will take charge.

root@zen-lb:~# rm /usr/local/zenloadbalancer/config/if_eth*
root@zen-lb:~# vi /usr/local/zenloadbalancer/config/if_bond0_confbond0::172.16.0.8:255.255.252.0:up::root@zen-lb:~# vi /usr/local/zenloadbalancer/config/if_bond1_confbond1::172.16.4.8:255.255.252.0:up::root@zen-lb:~# vi /usr/local/zenloadbalancer/config/if_bond2_confbond2::172.16.8.8:255.255.254.0:up::
root@zen-lb:~# vi /usr/local/zenloadbalancer/config/global.conf_conf.....
#System Default Gateway
$defaultgw="172.16.0.1";
#Interface Default Gateway
$defaultgwif="bond0";
.....
#Also change the ntp server
.....
$ntp="0.europe.pool.ntp.org";
.....

You might also want to change these particular ports on your switch to portfast. That way, you won't have to wait for the forward delay (and as far as these particular ports go, forward delay is useless any way) and the transition will be seemless.

All right, let's see if it all works:

root@zen-lb:~# cat /proc/net/bonding/bond0Ethernet Channel Bonding Driver: v3.5.0 (November 4, 2008)

Bonding Mode: fault-tolerance (active-backup)
Primary Slave: None
Currently Active Slave: eth8
MII Status: up
MII Polling Interval (ms): 100
Up Delay (ms): 0
Down Delay (ms): 0

Slave Interface: eth8
MII Status: up
Link Failure Count: 0
Permanent HW addr: 00:19:b9:e4:12:a3

Slave Interface: eth3
MII Status: up
Link Failure Count: 0
Permanent HW addr: 00:1f:29:57:cf:fe
root@zen-lb:~# cat /proc/net/bonding/bond1Ethernet Channel Bonding Driver: v3.5.0 (November 4, 2008)

Bonding Mode: fault-tolerance (active-backup)
Primary Slave: None
Currently Active Slave: eth9
MII Status: up
MII Polling Interval (ms): 100
Up Delay (ms): 0
Down Delay (ms): 0

Slave Interface: eth9
MII Status: up
Link Failure Count: 0
Permanent HW addr: 00:19:b9:e4:12:a5

Slave Interface: eth4
MII Status: up
Link Failure Count: 0
Permanent HW addr: 00:1f:29:0d:69:81
root@zen-lb:~# cat /proc/net/bonding/bond2Ethernet Channel Bonding Driver: v3.5.0 (November 4, 2008)

Bonding Mode: fault-tolerance (active-backup)
Primary Slave: None
Currently Active Slave: eth0
MII Status: up
MII Polling Interval (ms): 100
Up Delay (ms): 0
Down Delay (ms): 0

Slave Interface: eth0
MII Status: up
Link Failure Count: 0
Permanent HW addr: 00:1f:29:57:cf:fd

Slave Interface: eth5
MII Status: up
Link Failure Count: 0
Permanent HW addr: 00:1f:29:0d:69:80

And if you try to disconnect, or otherwise bring down any of the primary slave interfaces you'll see that the active backup will come up almost instantly (provided you set those ports to portfast on your switch).

Wednesday, July 9, 2014

Zen Load Balancer 3.0.3 Perfomance and Security Customization Part 1

I'm a bit partial to Zen Load Balancer. As a matter of a fact, I love it. It has many, many things ready to go from the start.

And it being just a Debian distro with the zenloadbalancer package on top, there's a lot you can do to customize it. The first thing we need to do is get rid of this:

My system has 16GB of memory but memory reported is just 3GB? Yup.
See Zen Load Balancer is a 32-bit app and it is distributed with a 32-bit Debian distro.

Assuming we have a 64-bit system with more memory installed we'll need to upgrade the kernel to a PAE one. This is both a performance and a security enhancement. It will allow us to use more memory and will also enable NX protection (provided that our BIOS and CPU support it too), as the NX bit works on the 63rd bit of the address.

Editing our repos first:

root@zen-lb:~# vi /etc/apt/sources.list#official repository for Debian
deb http://ftp.debian.org/debian/ stable main non-free
deb-src http://ftp.debian.org/debian/ stable main non-free
deb http://security.debian.org/ stable/updates main
deb-src http://security.debian.org/ stable/updates main
#official repository for Zen Load Balancer Updates
deb http://zenloadbalancer.sourceforge.net/apt/x86 v3/

#Let's add this repo as well to do a moderate PAE upgrade at first
deb http://security.debian.org/debian-security squeeze/updates main

Let's try to upgrade our kernel now:

root@zen-lb:~# apt-get update....
....
....
Reading package lists... Done
W: GPG error: http://ftp.debian.org stable Release: The following signatures couldn't be verified because the public key is not available: NO_PUBKEY 8B48AD6246925553 NO_PUBKEY 6FB2A1C265FFB764
W: GPG error: http://security.debian.org stable/updates Release: The following signatures couldn't be verified because the public key is not available: NO_PUBKEY 8B48AD6246925553

Yeah ok, classic Debian thing:

root@zen-lb:~# gpg --keyserver pgpkeys.mit.edu --recv-key 6FB2A1C265FFB764
root@zen-lb:~# gpg -a --export 6FB2A1C265FFB764 | apt-key add -
root@zen-lb:~# gpg --keyserver pgpkeys.mit.edu --recv-key 8B48AD6246925553
root@zen-lb:~# gpg -a --export 8B48AD6246925553 | apt-key add -
root@zen-lb:~# apt-get updateFetched 548 kB in 1s (444 kB/s)
Reading package lists... Done

One more time for the world:

root@zen-lb:~# apt-cache search linux-imagelinux-image-2.6-486 - Linux for older PCs (dummy package)
linux-image-2.6-686 - Linux for modern PCs (dummy package)
linux-image-2.6-686-bigmem - Linux for PCs with 4GB+ RAM (dummy package)
linux-image-2.6-686-pae - Linux for modern PCs (dummy package)
linux-image-2.6-amd64 - Linux for 64-bit PCs (dummy package)
linux-image-486 - Linux for older PCs (meta-package)
linux-image-686 - Linux for modern PCs (dummy package)
linux-image-686-bigmem - Linux for PCs with 4GB+ RAM (dummy package)
linux-image-686-pae - Linux for modern PCs (meta-package)
linux-image-amd64 - Linux for 64-bit PCs (meta-package)
linux-image-rt-686-pae - Linux for modern PCs (meta-package), PREEMPT_RT
linux-image-3.2.0-4-486 - Linux 3.2 for older PCs
linux-image-3.2.0-4-686-pae - Linux 3.2 for modern PCs
linux-image-3.2.0-4-686-pae-dbg - Debugging symbols for Linux 3.2.0-4-686-pae
linux-image-3.2.0-4-amd64 - Linux 3.2 for 64-bit PCs
linux-image-3.2.0-4-rt-686-pae - Linux 3.2 for modern PCs, PREEMPT_RT
linux-image-3.2.0-4-rt-686-pae-dbg - Debugging symbols for Linux 3.2.0-4-rt-686-pae
linux-headers-2.6.32-5-486 - Header files for Linux 2.6.32-5-486
linux-headers-2.6.32-5-686 - Header files for Linux 2.6.32-5-686
linux-headers-2.6.32-5-686-bigmem - Header files for Linux 2.6.32-5-686-bigmem
linux-headers-2.6.32-5-amd64 - Header files for Linux 2.6.32-5-amd64
linux-headers-2.6.32-5-openvz-686 - Header files for Linux 2.6.32-5-openvz-686
linux-headers-2.6.32-5-vserver-686 - Header files for Linux 2.6.32-5-vserver-686
linux-headers-2.6.32-5-vserver-686-bigmem - Header files for Linux 2.6.32-5-vserver-686-bigmem
linux-headers-2.6.32-5-xen-686 - Header files for Linux 2.6.32-5-xen-686
linux-image-2.6.32-5-486 - Linux 2.6.32 for old PCs
linux-image-2.6.32-5-686 - Linux 2.6.32 for modern PCs
linux-image-2.6.32-5-686-bigmem - Linux 2.6.32 for PCs with 4GB+ RAM
linux-image-2.6.32-5-686-bigmem-dbg - Debugging infos for Linux 2.6.32-5-686-bigmem
linux-image-2.6.32-5-amd64 - Linux 2.6.32 for 64-bit PCs
linux-image-2.6.32-5-openvz-686 - Linux 2.6.32 for modern PCs, OpenVZ support
linux-image-2.6.32-5-openvz-686-dbg - Debugging infos for Linux 2.6.32-5-openvz-686
linux-image-2.6.32-5-vserver-686 - Linux 2.6.32 for modern PCs, Linux-VServer support
linux-image-2.6.32-5-vserver-686-bigmem - Linux 2.6.32 for PCs with 4GB+ RAM, Linux-VServer support
linux-image-2.6.32-5-vserver-686-bigmem-dbg - Debugging infos for Linux 2.6.32-5-vserver-686-bigmem
linux-image-2.6.32-5-xen-686 - Linux 2.6.32 for modern PCs, Xen dom0 support
linux-image-2.6.32-5-xen-686-dbg - Debugging infos for Linux 2.6.32-5-xen-686

Right, let's be conservative and upgrade to a 2.6 PAE kernel, we'll do a major upgrade later:

root@zen-lb:~# uname -aLinux zen-lb 2.6.32-5-686 #1 SMP Wed Jan 12 04:01:41 UTC 2011 i686 GNU/Linuxroot@zen-lb:~# apt-get install linux-image-2.6.32-5-686-bigmemGet:1 http://ftp.debian.org/debian/ stable/main linux-base all 3.5 [34.3 kB]
Get:2 http://security.debian.org/debian-security/ squeeze/updates/main linux-image-2.6.32-5-686-bigmem i386 2.6.32-48squeeze6 [27.6 MB]
Get:3 http://ftp.debian.org/debian/ stable/main firmware-linux-free all 3.2 [20.7 kB]
Fetched 27.7 MB in 20s (1,366 kB/s)
Preconfiguring packages ...
(Reading database ... 18065 files and directories currently installed.)
Preparing to replace linux-base 2.6.32-30 (using .../linux-base_3.5_all.deb) ...
Unpacking replacement linux-base ...
Selecting previously deselected package linux-image-2.6.32-5-686-bigmem.
Unpacking linux-image-2.6.32-5-686-bigmem (from .../linux-image-2.6.32-5-686-bigmem_2.6.32-48squeeze6_i386.deb) ...
Selecting previously deselected package firmware-linux-free.
Unpacking firmware-linux-free (from .../firmware-linux-free_3.2_all.deb) ...
Processing triggers for man-db ...
Setting up linux-base (3.5) ...
Setting up linux-image-2.6.32-5-686-bigmem (2.6.32-48squeeze6) ...
Running depmod.
Running update-initramfs.
update-initramfs: Generating /boot/initrd.img-2.6.32-5-686-bigmem
Examining /etc/kernel/postinst.d.
run-parts: executing /etc/kernel/postinst.d/initramfs-tools 2.6.32-5-686-bigmem /boot/vmlinuz-2.6.32-5-686-bigmem
run-parts: executing /etc/kernel/postinst.d/zz-update-grub 2.6.32-5-686-bigmem /boot/vmlinuz-2.6.32-5-686-bigmem
Generating grub.cfg ...
Found linux image: /boot/vmlinuz-2.6.32-5-686-bigmem
Found initrd image: /boot/initrd.img-2.6.32-5-686-bigmem
Found linux image: /boot/vmlinuz-2.6.32-5-686
Found initrd image: /boot/initrd.img-2.6.32-5-686
done
Setting up firmware-linux-free (3.2) ...
update-initramfs: deferring update (trigger activated)
Processing triggers for initramfs-tools ...
update-initramfs: Generating /boot/initrd.img-2.6.32-5-686-bigmemroot@zen-lb:~# reboot

All right, so did it work?

Looks good. And what about NX?

root@zen-lb:~# dmesg | grep ".*NX.*protection"[    0.000000] NX (Execute Disable) protection: active

Cool. Let's go on then. Let's do a distro upgrade from Squeeze to Wheezy:

root@zen-lb:~# apt-get dist-upgrade
root@zen-lb:~# reboot

Indeed:

root@zen-lb:~# cat /etc/*releasePRETTY_NAME="Debian GNU/Linux 7 (wheezy)"
NAME="Debian GNU/Linux"
VERSION_ID="7"
VERSION="7 (wheezy)"
ID=debian
ANSI_COLOR="1;31"
HOME_URL="http://www.debian.org/"
SUPPORT_URL="http://www.debian.org/support/"
BUG_REPORT_URL="http://bugs.debian.org/"root@zen-lb:~# cat /proc/versionLinux version 3.2.0-4-686-pae (debian-kernel@lists.debian.org) (gcc version 4.6.3 (Debian 4.6.3-14) ) #1 SMP Debian 3.2.60-1+deb7u1

The first thing I'm going to do is tune my filesystem. To be honest, this is just a load balancer so I can afford to lose a few seconds of logs if the power goes down.

root@zen-lb:~# vi /etc/fstabproc            /proc           proc    defaults        0       0
# / was on /dev/sdb3 during installation
UUID=b6016824-536e-43bc-8f1f-fbfd2fab146d /               ext4    noatime,nodiratime,nobarrier,nobh,commit=120,data=writeback,journal_async_commit,errors=remount-ro 0       1
# /boot was on /dev/sdb1 during installation
UUID=6d4bd9ca-ba29-4700-b90c-07c614d79f0e /boot           ext4    defaults        0       2
# swap was on /dev/sdb2 during installation
UUID=3cb4eb4d-b0ab-4b60-825b-fc0224356580 none            swap    sw              0       0
/dev/scd0       /media/cdrom0   udf,iso9660 user,noauto     0       0
/dev/fd0        /media/floppy0  auto    rw,user,noauto  0       0

Go to single user mode and tune my root filesystem (mine is on /dev/sdb3):

root@zen-lb:~# init 1
root@zen-lb:~# tune2fs -O dir_index /dev/sdb3
root@zen-lb:~# umount -a
root@zen-lb:~# e2fsck -D /dev/sdb3

And now increase the number of open files limit:

root@zen-lb:~# vi /etc/security/limits.conf....
* soft nofile 65536
* hard nofile 65536
# End of file

When regular users log in, they get an open files warning, let's correct it by uncommenting these few lines of code in /etc/profile:

root@zen-lb:~# vi /etc/profile....
#if [ -f /etc/sysctl.conf ]; then
# FILEMAX=`grep "^fs.file-max.*=" /etc/sysctl.conf | awk -F'=' '{printf $2}'`
# if [ "$FILEMAX" != "" ]; then
#  ulimit -n $FILEMAX
# fi
#fi

Finally, let's update our repos to correctly receive wheezy updates:

root@zen-lb:~# vi /etc/apt/sources.list#official repository for Debian
deb http://ftp.debian.org/debian wheezy main contrib non-free
deb-src http://ftp.debian.org/debian wheezy main contrib non-free
deb http://ftp.debian.org/debian wheezy-updates main contrib non-free

deb http://http.debian.net/debian wheezy main contrib non-free
deb-src http://http.debian.net/debian wheezy main contrib non-free

deb http://http.debian.net/debian wheezy-updates main contrib non-free
deb-src http://http.debian.net/debian wheezy-updates main contrib non-free

deb http://security.debian.org/ wheezy/updates main contrib non-free
deb-src http://security.debian.org/ wheezy/updates main contrib non-free

#official repository for Zen Load Balancer Updates
deb http://zenloadbalancer.sourceforge.net/apt/x86 v3/

All right, I guess we didn't do that much, but it's enough to call the end of Part 1.

Friday, April 25, 2014

pfSense and High Availability Part 3 - Gateway Failover (Multi-WAN)

With this method we ensure that if one of the gateways that pfSense uses fails, it will switch over to a working one. In this example, my internal network is the 192.168.200/24, my primary gateway router's IP is 192.168.150.1 and my backup gateway router's IP is 192.168.100.1.

First of all, make sure your interfaces and gateways are set:

Now go to System, General Setup and either change the DNS servers so they reflect the ones in the corresponding networks or use public ones. You need to make sure you have at least one DNS server per gateway. In my example I use Google's DNS servers (8.8.8.8 and 8.8.4.4).

Time to create a Gateway Group. Go to System, Routing and click on the "Groups" tab. Select "+". Here we just say that we want to prefer the 192.168.150.1 gateway unless we experience packet loss or high latency, in which case we switch to 192.168.100.1.

The following comes straight from pfSense documentation. There's no better way to desribe what those options here do.

Tiers

In a gateway group, you assign each gateway to a tier to determine when it is used. The lower tier numbers are preferred. If any two gateways are on the same tier, they will load balance. If they are on different tiers, they will do failover preferring the lower tier. If the tier is set to "Never" then the gateway is not considered part of this group.

Trigger Level

Member Down

Triggers when the monitor IP has 100% packet loss.

Packet Loss

Triggers only when there is packet loss to a gateway higher than its defined threshold.

High Latency

Triggers only when there is latency (delay) to a gateway higher than its defined threshold.

Packet Loss or High Latency

Triggers for either of the above conditions.

Load Balancing

When two gateways are on the same tier, they will load balance. This means that on a per-connection basis, traffic is routed over each WAN in a round-robin manner. If any gateway on the same tier goes down, it is removed from use and the other gateways on the tier continue to operate normally.

Failover

When two gateways are on different tiers, the lower tier gateway(s) are preferred. If a lower tier gateway goes down, it is removed from use and the next highest tier gateway is used.

Combinations

Because of the tier system, you can have any number of combinations of load balancing and failover that you like, such as One WAN that if it goes down fails to two load balancing WANs that if both go down fail to three load balancing WANs, and so on. The only limit is that there are only 5 tiers so such configurations can only go 5 levels deep.

Our configuration was pretty straight-forward considering the options. We only said that we wanted to prefer the 192.168.150.1 gateway unless we experienced packet loss or high latency, in which case we switch to 192.168.100.1. Let's add another option here. If the 192.168.100.1 gateway fails, use 192.168.150.1. Better to have degraded service than no service at all, obviously.

So our configuration should now look something like this:

Honestly, just the first rule should do the trick but I have put the second one there for illustration purposes. Time to change our LAN rules so they actually use our newly configured interfaces:

Go to Firewall, Rules and click on the "LAN" tab. Edit your LAN rules. You need to go all the way down to Gateway and click on the "Advanced" button. Choose the first gateway group you created. Save and apply.

Now, in the Rules page again go to your newly edited rule and select "+", or "add a new rule based on this one". Go ahead and change the gateway to the other gateway group (if you have created any). Save and apply.

You know how it goes now. You need to do the same for every existing rule in your LAN rule set.

This is how my LAN rule set looks now. Yours should look similar.

If you have created load balancer pools in pfSense, you should also go to System, Advanced, click on the "Miscellaneous" tab and tick the "Allow default gateway switching" box under "Load Balancing".

All right, time for some tweaks!

Something that is considered good practice is to also change your gateway's monitor address. What do I mean by this? Well, by default, pfSense will ping your gateway and thus decide if it's up or down. But just because you can ping it, doesn't mean you can route through it. You might be able to ping its IP address, but its external interface might be experiencing problems. So what we can do is tell pfSense to ping something on the internet instead. That's a better method.

Go to System, Routing, Gateways and click on "e" to edit them. Put a public server's address there that you know accepts ICMP requests, such as Google's public DNS server.

In this page, you can also tweak settings such as response times to reflect what you consider a bad connection in order for the failover to happen (this can be found under "Advanced").

Do the same for your other gateway.

Check to see if everything works as intended. Your gateways' status can be seen under Status, Gateways.

Done!

Wednesday, April 23, 2014

pfSense and High Availability Part 2 - Node Failover

Now that we've covered network interface failover, time for the most popular high availability method; node failover.

First we need a dedicated network interface for this. Let's enable our interfaces for the job, select our IPs, subnets, etc. If subnetting is not exactly your forte, /30 is 2 hosts while /29 is 6. Just use an available network. I'll cover subnetting some time in the future. It's easy really. For my needs I selected the 192.168.250.0/30 network which leaves me with the 192.168.250.1 and 192.168.250.2 IP addresses available to assign to my nodes.

Save and apply. Now go to your firewall rules (Firewall, Rules), select the tab that responds to the interface you've selected for the task (OPT1 in my case) and allow everything (any) for that specific network (192.168.250.0/30 in my case).

Save and apply.

Now, to actually set up our VIP (Virtual IP). There's two ways we can so this: One is go to System, High Avail. Sync and the other to Firewall, Virtual IPs, CARP settings tab.

Go to the node that you intend to use as a master and check on the "Synchronize states" box. Choose the network interface we've been working with for this (OPT1 in my case) and also insert the slave's IP address into the "pfsync Synchronize Peer IP" if you want to avoid pfSense spamming multicast.
Synchronize Config to IP: Insert the backup node's IP address (192.168.250.2 in my case)
Remote System Username: Insert the system username and password
Remote System Password: Insert the system password
And then check everything you want to synchronize. I want everything, so I'll check everything.

Now, Save and go to your backup node. What we want to do is exactly the same, changing the IP in the "pfsync Synchronize Peer IP" box.
Synchronize Config to IP: Insert NOTHING. This should only be used in the master node.
Remote System Username: Insert NOTHING. This should only be used in the master node.
Remote System Password: Insert NOTHING. This should only be used in the master node.
And then check everything you want to synchronize. I want everything, so I'll check everything.

Now go back to your master node. Go to Firewall, Virtual IP addresses and select "+" to add one.
Let's begin with our WAN interface.
Type: CARP
Interface: WAN
IP address(es): Choose a new available IP, in the same subnet as the old one, which will be the Virtual IP of the cluster. In my case that is 192.168.0.9/24.
Virtual IP Password: Just choose a password for this.
VHID Group: Usually 1 is fine, but if you have systems that already use CARP in your network (such as Zen Load Balancer) you might want to change this.
Advertising Frequency: Leave it to 1/0 for master.

Save and apply. Now let's go ahead and do the same for the rest of our interfaces, always keeping in mind that we should change our VHID groups to a unique number.

Save and apply. Our settings should look something like this:

Go to your master and backup nodes to see if everything is working through Status, CARP (failover). Sometimes you need to manually enable CARP if you see "Status DISABLED". No biggie.

Friday, March 28, 2014

pfSense and High Availability Part 1 - Network Interface Bonding (LAGG)

One of the problems people face when deploying pfSense is network interface bonding. It's not very straight-forward and in some ways counterintuitive.

Let me illustrate the problem:

We've already set up our external (0 for Cisco and WAN for pfSense afficionados) and internal (100 for Cisco and LAN for pfSense afficionados) interfaces. When we try to set up LAGG, these two interfaces do not appear available for setting up, although every other interface is. The problem lies in the fact that they are already in use. So how do we go about setting up network interface bonding in pfSense?

It's actually pretty simple. Let me illustrate:

First of all, for the sake of clarity my WAN and LAN interfaces are bce0 and bce1:

Go to Interfaces -> Assign -> LAGG and select "+":

Create a WAN LAGG bond consisting of only the interface(s) that will be available, as if the bond consisted of the network interfaces you'd ideally like to consist, except the currently used interface for WAN. Ugh, I'm making it sound more complicated than it is.

To make it clearer, let's suppose you wanted to create a WAN bond consisting of bce0 and em3. What we would ideally like to do is choose bce0 and em3. Well, in our case we only select em3 (bce0 is not available to us anyway) and we create a LAGG team consisted solely of that one interface, silly as it may sound initially.

Save and repeat the process for the LAN LAGG team, creating a team using the interfaces we'd like the team to consist of except the currently used LAN interface.

Save and create the rest of your LAGG interfaces as you would usually.

Here's an idea of what we should roughly have when we're done with this process:

Now, go to "Interface Assignments":

Change the interface assignments to their LAGG interface counterparts, save and add any ones that are needed. Take a peek at mine:

Go to LAGG again:

Edit the WAN LAGG interface:

The previously unavailable WAN interface should be available to form our team now. Select as needed and save:

Repeat the process for the LAN interface:

Everything should be working:

In case your master interface priority is wrong, all you need to do is backup your configuration, open and edit your config.xml file, manually change their position and upload.

For example:

    <laggs>
      <lagg>
        <members>em3,bce0</members>
        <descr><![CDATA[WAN_TEAM]]></descr>
        <laggif>lagg0</laggif>
        <proto>failover</proto>
      </lagg>
      <lagg>
        <members>em4,bce1</members>
        <descr><![CDATA[LAN_TEAM]]></descr>
        <laggif>lagg1</laggif>
        <proto>failover</proto>
      </lagg>
      <lagg>
        <members>em0,em5</members>
        <descr><![CDATA[CARP_TEAM]]></descr>
        <laggif>lagg2</laggif>
        <proto>failover</proto>
      </lagg>
    </laggs>

Now, I would like for my WAN bond to have bce0 as the master/primary interface, for LAN bce1 and for CARP em0. Therefore I edit like so:

    <laggs>
      <lagg>
        <members>bce0,em3</members>
        <descr><![CDATA[WAN_TEAM]]></descr>
        <laggif>lagg0</laggif>
        <proto>failover</proto>
      </lagg>
      <lagg>
        <members>bce1,em4</members>
        <descr><![CDATA[LAN_TEAM]]></descr>
        <laggif>lagg1</laggif>
        <proto>failover</proto>
      </lagg>
      <lagg>
        <members>em0,em5</members>
        <descr><![CDATA[CARP_TEAM]]></descr>
        <laggif>lagg2</laggif>
        <proto>failover</proto>
      </lagg>
    </laggs>

And re-upload to the server in question. Simple enough process.

Note: In pfSense 2.2 and above, LAGG using LACP in FreeBSD 10.0 and newer defaults to "strict mode" being enabled, which means the lagg does not come up unless your switch is speaking LACP.

This will cause your LAGG to not function after upgrade if your switch isn't using active mode LACP.
You can retain the lagg behavior in pfSense 2.1.5 and earlier versions by adding a new system tunable under System>Advanced, System Tunables tab for the following:

net.link.lagg.0.lacp.lacp_strict_mode

With value set to 0. You can configure this in 2.1.5 before upgrading to 2.2, to ensure the same behavior on first boot after the upgrade. It will result in a harmless cosmetic error in the logs on 2.1.5 since the value does not exist in that version.
If you have more than one LAGG interface configured, you will need to enter a tunable for each since that is a per-interface option. So for lagg1, you would add the following.

net.link.lagg.1.lacp.lacp_strict_mode

Also with the value set to 0.

Wednesday, March 26, 2014

MySQL Load Balancing Part 2

In my previous post, I tried to fix the haproxy flaws by swapping it with Zen Load Balancer. Here's the thing though: Zen Load Balancer introduces a flaw of its own; It uses pen for TCP load balancing, which can increase the CPU load to stupid levels. On a relatively mild benchmark that I performed on my MySQL cluster, I witnessed pen's process shoot to 60% CPU usage. Not good.

So what do we do? Well, if we use the best open-source solution for MySQL clustering, Percona XtraDB cluster, the answer is pretty simple: we keep Zen Load Balancer and all its goodies but just for our MySQL farm, we revert to haproxy and some nifty tools provided to us by Percona and we're set! Best of both worlds!

Here's what we do:

Log in to your MySQL console and create a user "clustercheckuser" with the following credentials:

Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 156322
Server version: 5.6.15-56-log Percona XtraDB Cluster (GPL), Release 25.4, Revision 731, wsrep_25.4.r4043

Copyright (c) 2009-2013 Percona LLC and/or its affiliates
Copyright (c) 2000, 2013, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql> grant process on *.* to 'clustercheckuser'@'localhost' identified by 'clustercheckpassword!';
Query OK, 0 rows affected (0.01 sec)
mysql> grant process on *.* to 'clustercheckuser'@'127.0.0.1' identified by 'clustercheckpassword!';
Query OK, 0 rows affected (0.00 sec)
mysql> flush privileges;
Query OK, 0 rows affected (0.00 sec)

Now, Percona gives us two tools to work with: clustercheck and pyclustercheck (they do exactly the same thing, but pyclustercheck is written in python and does not require the use of xinetd). What it does is it sends an HTTP 200 response in case the cluster is up and running and an HTTP 503 in case there is something wrong with it. Goodbye haproxy's buggy mysql-check! Nice to see you again tried, tested and great httpchk!

Let's go ahead and configure everything required to run clustercheck. Change the 192.168.108.0/24 to the needs of your network:

[root@mysql1 ~]# vi /etc/xinetd.d/mysqlchk 
# default: on
# description: mysqlchk
service mysqlchk
{
# this is a config for xinetd, place it in /etc/xinetd.d/
        disable = no
        flags           = REUSE
        socket_type     = stream
        port            = 9200
        wait            = no
        user            = nobody
        server          = /usr/bin/clustercheck
        log_on_failure  += USERID
        only_from       = 192.168.108.0/24
        per_source      = UNLIMITED
}

Change port 9200 in /etc/services:

[root@mysql1 ~]# vi /etc/services ....
sun-as-jpda     9191/udp                # Sun AppSvr JPDA
mysqlchk        9200/tcp                # Percona mysqlchk
#wap-wsp         9200/tcp                # WAP connectionless session service
wap-wsp         9200/udp                # WAP connectionless session service
....

Now, let's go ahead and install xinetd:

[root@mysql1 ~]# yum -y install xinetd.x86_64
[root@mysql1 ~]# chkconfig xinetd on
[root@mysql1 ~]# service xinetd start

Check that it's up and working:

[root@mysql1 ~]# netstat -ntlp Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address               Foreign Address             State       PID/Program name   
tcp        0      0 127.0.0.1:25                0.0.0.0:*                   LISTEN      3842/master         
tcp        0      0 0.0.0.0:3306                0.0.0.0:*                   LISTEN      3704/mysqld         
tcp        0      0 0.0.0.0:22                  0.0.0.0:*                   LISTEN      2637/sshd           
tcp        0      0 0.0.0.0:4567                0.0.0.0:*                   LISTEN      3704/mysqld         
tcp        0      0 ::1:25                      :::*                        LISTEN      3842/master         
tcp        0      0 :::9200                     :::*                        LISTEN      20371/xinetd        
tcp        0      0 :::22                       :::*                        LISTEN      2637/sshd

As we can see, there is definitely a server listening on port 9200. Great, time to check our iptables rules:

[root@mysql1 ~]# iptables -L -v -n --line-numbers Chain INPUT (policy ACCEPT 0 packets, 0 bytes)
num   pkts bytes target     prot opt in     out     source               destination         
1     318K   62M ACCEPT     all  --  *      *       0.0.0.0/0            0.0.0.0/0           state RELATED,ESTABLISHED 
2        2   120 ACCEPT     all  --  lo     *       0.0.0.0/0            0.0.0.0/0           
3        0     0 ACCEPT     tcp  --  *      *       192.168.108.0/24      0.0.0.0/0           tcp dpt:3306 
4        0     0 ACCEPT     tcp  --  *      *       192.168.108.0/24      0.0.0.0/0           tcp dpt:4444 
5        4   240 ACCEPT     tcp  --  *      *       192.168.108.0/24      0.0.0.0/0           tcp dpt:4567 
6        0     0 ACCEPT     tcp  --  *      *       192.168.108.0/24      0.0.0.0/0           tcp dpt:4568 
7        2   120 ACCEPT     tcp  --  *      *       192.168.108.0/24      0.0.0.0/0           state NEW tcp dpt:22 
8        0     0 REJECT     all  --  *      *       0.0.0.0/0            0.0.0.0/0           reject-with icmp-host-prohibited 

Chain FORWARD (policy ACCEPT 0 packets, 0 bytes)
num   pkts bytes target     prot opt in     out     source               destination         
1        0     0 REJECT     all  --  *      *       0.0.0.0/0            0.0.0.0/0           reject-with icmp-host-prohibited 

Chain OUTPUT (policy ACCEPT 0 packets, 0 bytes)
num   pkts bytes target     prot opt in     out     source               destination         
1     271K   28M ACCEPT     all  --  *      *       0.0.0.0/0            0.0.0.0/0

Right, these are the ports needed for Percona XtraDB cluster to work and SSH. Everything else is rejected. What I need to do is add a rule before my "reject all" rule. So, let connections from my network to port 9200, as the 8th rule in the INPUT chain:

[root@mysql1 ~]# iptables -I INPUT 8 -s 192.168.108.0/24 -p tcp -m tcp --dport 9200 -j ACCEPT
[root@mysql1 ~]# iptables -L -v -n --line-numbers Chain INPUT (policy ACCEPT 0 packets, 0 bytes)
num   pkts bytes target     prot opt in     out     source               destination         
1     318K   62M ACCEPT     all  --  *      *       0.0.0.0/0            0.0.0.0/0           state RELATED,ESTABLISHED 
2        2   120 ACCEPT     all  --  lo     *       0.0.0.0/0            0.0.0.0/0           
3        0     0 ACCEPT     tcp  --  *      *       192.168.108.0/24      0.0.0.0/0           tcp dpt:3306 
4        0     0 ACCEPT     tcp  --  *      *       192.168.108.0/24      0.0.0.0/0           tcp dpt:4444 
5        4   240 ACCEPT     tcp  --  *      *       192.168.108.0/24      0.0.0.0/0           tcp dpt:4567 
6        0     0 ACCEPT     tcp  --  *      *       192.168.108.0/24      0.0.0.0/0           tcp dpt:4568 
7        2   120 ACCEPT     tcp  --  *      *       192.168.108.0/24      0.0.0.0/0           state NEW tcp dpt:22 
8        0     0 ACCEPT     tcp  --  *      *       192.168.108.0/24      0.0.0.0/0           tcp dpt:9200 
9        0     0 REJECT     all  --  *      *       0.0.0.0/0            0.0.0.0/0           reject-with icmp-host-prohibited 

Chain FORWARD (policy ACCEPT 0 packets, 0 bytes)
num   pkts bytes target     prot opt in     out     source               destination         
1        0     0 REJECT     all  --  *      *       0.0.0.0/0            0.0.0.0/0           reject-with icmp-host-prohibited 

Chain OUTPUT (policy ACCEPT 0 packets, 0 bytes)
num   pkts bytes target     prot opt in     out     source               destination         
1     271K   28M ACCEPT     all  --  *      *       0.0.0.0/0            0.0.0.0/0  
[root@mysql1 ~]# iptables-save > /etc/sysconfig/iptables

Check that everything is working (obviously substitute 192.168.108.20 with your node's IP address):

[root@mysql1 ~]# nc 192.168.108.20 9200 HTTP/1.1 200 OK
Content-Type: text/plain
Connection: close
Content-Length: 40

Percona XtraDB Cluster Node is synced.

Nice! Time to head over to our Zen Load Balancer and configure haproxy on it. First, remember to delete or at least stop your MySQL farm (if you have any). Install and configure haproxy:

root@zen-lb:~# apt-get update 
root@zen-lb:~# apt-get install haproxy
root@zen-lb:~# mv /etc/haproxy/haproxy.cfg /etc/haproxy/haproxy.cfg.orig
root@zen-lb:~# vi /etc/haproxy/haproxy.cfg global
        log 127.0.0.1   local0
        log 127.0.0.1   local1 notice
        chroot /usr/share/haproxy
        user haproxy
        group haproxy
        daemon
defaults
        log     global
        mode    http
        option  tcplog
        option  dontlognull
        retries 3
        option redispatch
        maxconn 50000
        timeout connect 3500ms
        timeout client 50000ms
        timeout server 50000ms

listen stats :445 #We set up our stats screen, remove block if not wanted or could be integrated below if mode was http. Now we can access the stats at http://LOAD_BALANCER_IP:445/haproxy using username: haproxy and password: haproxy
        mode http
        stats enable
        #stats hide-version
        stats realm Haproxy\ Statistics
        stats uri /haproxy
        stats auth haproxy:haproxy_password

listen Percona_xtradb_cluster_read 192.168.104.10:3306
       balance roundrobin # Typical roundrobin method
       mode tcp #In this mode, the service relays TCP connections as soon as they're established, towards one or several servers. No processing is done on the stream. Two other options are: http and health
       option tcpka #Enable TCP keep-alives on both the client and server sides. This makes it possible to prevent long sessions from expiring on external layer 4 components such as firewalls and load-balancers.
       option httpchk #When option httpchk is specified, a complete HTTP request is sent once the TCP connection is established, and responses 2xx and 3xx are considered valid, while all other ones indicate a server failure, including the lack of any response. 
       server MySQL1 192.168.108.20:3306 check port 9200 inter 5000 downinter 30000 rise 5 fall 1
       server MySQL2 192.168.108.30:3306 check port 9200 inter 5000 downinter 30000 rise 5 fall 1
       server MySQL3 192.168.108.40:3306 check port 9200 inter 5000 downinter 30000 rise 5 fall 1
       server MySQL4 192.168.108.50:3306 check port 9200 inter 5000 downinter 30000 rise 5 fall 1
       server MySQL5 192.168.108.60:3306 check port 9200 inter 5000 downinter 30000 rise 5 fall 1
       server MySQL6 192.168.108.70:3306 check port 9200 inter 5000 downinter 30000 rise 5 fall 1
       server MySQL7 192.168.108.80:3306 check port 9200 inter 5000 downinter 30000 rise 5 fall 1
       server MySQL8 192.168.108.90:3306 check port 9200 inter 5000 downinter 30000 rise 5 fall 1
       server MySQL9 192.168.108.100:3306 check port 9200 inter 5000 downinter 30000 rise 5 fall 1

Here, I have configured the farm to listen to 192.168.1.104.10, port 3306 and my Percona XtraDB cluster has 9 nodes:192.168.108.20, 192.168.108.30, 192.168.108.40, 192.168.108.50, 192.168.108.60, 192.168.108.70, 192.168.108.80, 192.168.108.90, and 192.168.108.100. The maximum connections are 50,000. The connection timeout is 3.5s, while the response timeout is 50s. As before, you need to change these settings to the needs of your network. Set the response timeout too soon and you'll get false positives, resulting to servers getting shut off and connections between a client and a working server cut off, which results to unhappy clients; Set it too high and your load balancer will be late shutting off traffic to dead servers, resulting your service seeming unavailable to some clients, which results to unhappy clients. Finally, it sends probes to check whether a node is up or down every 5 secs if the node has been marked as 'up' (inter 5000), every 30 secs if it has marked as 'down' (downinter 30000), while it while mark a node that has been marked as 'down' only after 5 successful probes (rise 5) but will mark a node that has been marked as 'up' after a single unsuccessful probe (fall 1).

You will also want to change the stats screen variables. Some people remove the "listen stats" section altogether, but if you decide to keep it, you'll definitely want to change the password, which I have set to "haproxy_password", perhaps the user, which I have set to "haproxy" and maybe also the URI and the port (I have set it to http://192.168.104.10:445/haproxy).

We'll also change our haproxy memory usage settings. Change this according to your system's resources, I've set mine to use 2 gigs of RAM:

root@zen-lb:~# vi /etc/default/haproxy # Set ENABLED to 1 if you want the init script to start haproxy.
ENABLED=1
# Add extra flags here.
EXTRAOPTS="-de -m 2048"

Almost done. Time to create the necessary directories, make sure they have the correct permissions, start haproxy and arrange so that zen load balancer automatically starts and stops it in case of restarts, cluster failovers etc:

root@zen-lb:~# mkdir /usr/share/haproxy
root@zen-lb:~# chown haproxy:haproxy /usr/share/haproxy/
root@zen-lb:~# chown haproxy:haproxy /etc/haproxy/haproxy.cfg
root@zen-lb:~# chmod 640 /etc/haproxy/haproxy.cfg
root@zen-lb:~# service haproxy start
root@zen-lb:~# update-rc.d haproxy defaults
root@zen-lb:~# vi /usr/local/zenloadbalancer/config/zlb-start#make your own script in your favorite language, it will be called
#at the end of the procedure /etc/init.d/zenloadbalacer start
#and replicated to the other node if zen cluster is running.
service haproxy restart
root@zen-lb:~# vi /usr/local/zenloadbalancer/config/zlb-stop#make your own script in your favorite language, it will be called
#at the end of the procedure /etc/init.d/zenloadbalacer start
#and replicated to the other node if zen cluster is running.
service haproxy stop

Remember when I mentioned about my relatively mild benchmark and that I witnessed pen's process shoot to 60% CPU usage? Care to guess what haproxy's usage is now under the same conditions? 20%. Great stuff.