Friday, April 25, 2014

pfSense and High Availability Part 3 - Gateway Failover (Multi-WAN)

With this method we ensure that if one of the gateways that pfSense uses fails, it will switch over to a working one. In this example, my internal network is the 192.168.200/24, my primary gateway router's IP is 192.168.150.1 and my backup gateway router's IP is 192.168.100.1.


First of all, make sure your interfaces and gateways are set:



Now go to System, General Setup and either change the DNS servers so they reflect the ones in the corresponding networks or use public ones. You need to make sure you have at least one DNS server per gateway. In my example I use Google's DNS servers (8.8.8.8 and 8.8.4.4).


Time to create a Gateway Group. Go to System, Routing and click on the "Groups" tab. Select "+". Here we just say that we want to prefer the 192.168.150.1 gateway unless we experience packet loss or high latency, in which case we switch to 192.168.100.1.


The following comes straight from pfSense documentation. There's no better way to desribe what those options here do.

Tiers

     In a gateway group, you assign each gateway to a tier to determine when it is used. The lower tier numbers are preferred. If any two gateways are on the same tier, they will load balance. If they are on different tiers, they will do failover preferring the lower tier. If the tier is set to "Never" then the gateway is not considered part of this group.

Trigger Level

  • Member Down
     Triggers when the monitor IP has 100% packet loss.
  • Packet Loss
     Triggers only when there is packet loss to a gateway higher than its defined threshold.
  • High Latency
     Triggers only when there is latency (delay) to a gateway higher than its defined threshold.
  • Packet Loss or High Latency
     Triggers for either of the above conditions.

Load Balancing

     When two gateways are on the same tier, they will load balance. This means that on a per-connection basis, traffic is routed over each WAN in a round-robin manner. If any gateway on the same tier goes down, it is removed from use and the other gateways on the tier continue to operate normally.

Failover

     When two gateways are on different tiers, the lower tier gateway(s) are preferred. If a lower tier gateway goes down, it is removed from use and the next highest tier gateway is used.

Combinations

     Because of the tier system, you can have any number of combinations of load balancing and failover that you like, such as One WAN that if it goes down fails to two load balancing WANs that if both go down fail to three load balancing WANs, and so on. The only limit is that there are only 5 tiers so such configurations can only go 5 levels deep.

Our configuration was pretty straight-forward considering the options. We only said that we wanted to prefer the 192.168.150.1 gateway unless we experienced packet loss or high latency, in which case we switch to 192.168.100.1. Let's add another option here. If the 192.168.100.1 gateway fails, use 192.168.150.1. Better to have degraded service than no service at all, obviously.


So our configuration should now look something like this:


Honestly, just the first rule should do the trick but I have put the second one there for illustration purposes. Time to change our LAN rules so they actually use our newly configured interfaces:


Go to Firewall, Rules and click on the "LAN" tab. Edit your LAN rules. You need to go all the way down to Gateway and click on the "Advanced" button. Choose the first gateway group you created. Save and apply.


Now, in the Rules page again go to your newly edited rule and select "+", or "add a new rule based on this one". Go ahead and change the gateway to the other gateway group (if you have created any). Save and apply.


You know how it goes now. You need to do the same for every existing rule in your LAN rule set.

This is how my LAN rule set looks now. Yours should look similar.


If you have created load balancer pools in pfSense, you should also go to System, Advanced, click on the "Miscellaneous" tab and tick the "Allow default gateway switching" box under "Load Balancing".


All right, time for some tweaks!

Something that is considered good practice is to also change your gateway's monitor address. What do I mean by this? Well, by default, pfSense will ping your gateway and thus decide if it's up or down. But just because you can ping it, doesn't mean you can route through it. You might be able to ping its IP address, but its external interface might be experiencing problems. So what we can do is tell pfSense to ping something on the internet instead. That's a better method.

Go to System, Routing, Gateways and click on "e" to edit them. Put a public server's address there that you know accepts ICMP requests, such as Google's public DNS server.


In this page, you can also tweak settings such as response times to reflect what you consider a bad connection in order for the failover to happen (this can be found under "Advanced").


Do the same for your other gateway.


Check to see if everything works as intended. Your gateways' status can be seen under Status, Gateways.




Done!

Wednesday, April 23, 2014

pfSense and High Availability Part 2 - Node Failover

Now that we've covered network interface failover, time for the most popular high availability method; node failover.

First we need a dedicated network interface for this. Let's enable our interfaces for the job, select our IPs, subnets, etc. If subnetting is not exactly your forte, /30 is 2 hosts while /29 is 6. Just use an available network. I'll cover subnetting some time in the future. It's easy really. For my needs I selected the 192.168.250.0/30 network which leaves me with the 192.168.250.1 and 192.168.250.2 IP addresses available to assign to my nodes.


Save and apply. Now go to your firewall rules (Firewall, Rules), select the tab that responds to the interface you've selected for the task (OPT1 in my case) and allow everything (any) for that specific network (192.168.250.0/30 in my case).


Save and apply.



Now, to actually set up our VIP (Virtual IP). There's two ways we can so this: One is go to System, High Avail. Sync and the other to Firewall, Virtual IPs, CARP settings tab.

Go to the node that you intend to use as a master and check on the "Synchronize states" box. Choose the network interface we've been working with for this (OPT1 in my case) and also insert the slave's IP address into the "pfsync Synchronize Peer IP" if you want to avoid pfSense spamming multicast.
Synchronize Config to IP: Insert the backup node's IP address (192.168.250.2 in my case)
Remote System Username: Insert the system username and password
Remote System Password: Insert the system password
And then check everything you want to synchronize. I want everything, so I'll check everything.


Now, Save and go to your backup node. What we want to do is exactly the same, changing the IP in the "pfsync Synchronize Peer IP" box.
Synchronize Config to IP: Insert NOTHING. This should only be used in the master node.
Remote System Username: Insert NOTHING. This should only be used in the master node.
Remote System Password: Insert NOTHING. This should only be used in the master node.
And then check everything you want to synchronize. I want everything, so I'll check everything.


Now go back to your master node. Go to Firewall, Virtual IP addresses and select "+" to add one.
Let's begin with our WAN interface.
Type: CARP
Interface: WAN
IP address(es): Choose a new available IP, in the same subnet as the old one, which will be the Virtual IP of the cluster. In my case that is 192.168.0.9/24.
Virtual IP Password: Just choose a password for this. 
VHID Group: Usually 1 is fine, but if you have systems that already use CARP in your network (such as Zen Load Balancer) you might want to change this.
Advertising Frequency: Leave it to 1/0 for master.


Save and apply. Now let's go ahead and do the same for the rest of our interfaces, always keeping in mind that we should change our VHID groups to a unique number.



Save and apply. Our settings should look something like this:


Go to your master and backup nodes to see if everything is working through Status, CARP (failover). Sometimes you need to manually enable CARP if you see "Status DISABLED". No biggie.