Friday, March 28, 2014

pfSense and High Availability Part 1 - Network Interface Bonding (LAGG)

One of the problems people face when deploying pfSense is network interface bonding. It's not very straight-forward and in some ways counterintuitive.

Let me illustrate the problem:

We've already set up our external (0 for Cisco and WAN for pfSense afficionados) and internal (100 for Cisco and LAN for pfSense afficionados) interfaces. When we try to set up LAGG, these two interfaces do not appear available for setting up, although every other interface is. The problem lies in the fact that they are already in use. So how do we go about setting up network interface bonding in pfSense?

It's actually pretty simple. Let me illustrate:

First of all, for the sake of clarity my WAN and LAN interfaces are bce0 and bce1:


Go to Interfaces -> Assign -> LAGG and select "+":


Create a WAN LAGG bond consisting of only the interface(s) that will be available, as if the bond consisted of the network interfaces you'd ideally like to consist, except the currently used interface for WAN. Ugh, I'm making it sound more complicated than it is.

To make it clearer, let's suppose you wanted to create a WAN bond consisting of bce0 and em3. What we would ideally like to do is choose bce0 and em3. Well, in our case we only select em3 (bce0 is not available to us anyway) and we create a LAGG team consisted solely of that one interface, silly as it may sound initially.


Save and repeat the process for the LAN LAGG team, creating a team using the interfaces we'd like the team to consist of except the currently used LAN interface.


Save and create the rest of your LAGG interfaces as you would usually.


Here's an idea of what we should roughly have when we're done with this process:


Now, go to "Interface Assignments":


Change the interface assignments to their LAGG interface counterparts, save and add any ones that are needed. Take a peek at mine:


Go to LAGG again:


Edit the WAN LAGG interface:


The previously unavailable WAN interface should be available to form our team now. Select as needed and save:


Repeat the process for the LAN interface:


Everything should be working:


In case your master interface priority is wrong, all you need to do is backup your configuration, open and edit your config.xml file, manually change their position and upload.

For example:
    <laggs>
      <lagg>
        <members>em3,bce0</members>
        <descr><![CDATA[WAN_TEAM]]></descr>
        <laggif>lagg0</laggif>
        <proto>failover</proto>
      </lagg>
      <lagg>
        <members>em4,bce1</members>
        <descr><![CDATA[LAN_TEAM]]></descr>
        <laggif>lagg1</laggif>
        <proto>failover</proto>
      </lagg>
      <lagg>
        <members>em0,em5</members>
        <descr><![CDATA[CARP_TEAM]]></descr>
        <laggif>lagg2</laggif>
        <proto>failover</proto>
      </lagg>
    </laggs>

Now, I would like for my WAN bond to have bce0 as the master/primary interface, for LAN bce1 and for CARP em0. Therefore I edit like so:
    <laggs>
      <lagg>
        <members>bce0,em3</members>
        <descr><![CDATA[WAN_TEAM]]></descr>
        <laggif>lagg0</laggif>
        <proto>failover</proto>
      </lagg>
      <lagg>
        <members>bce1,em4</members>
        <descr><![CDATA[LAN_TEAM]]></descr>
        <laggif>lagg1</laggif>
        <proto>failover</proto>
      </lagg>
      <lagg>
        <members>em0,em5</members>
        <descr><![CDATA[CARP_TEAM]]></descr>
        <laggif>lagg2</laggif>
        <proto>failover</proto>
      </lagg>
    </laggs>

And re-upload to the server in question. Simple enough process.

Note: In pfSense 2.2 and above, LAGG using LACP in FreeBSD 10.0 and newer defaults to "strict mode" being enabled, which means the lagg does not come up unless your switch is speaking LACP.

This will cause your LAGG to not function after upgrade if your switch isn't using active mode LACP.
You can retain the lagg behavior in pfSense 2.1.5 and earlier versions by adding a new system tunable under System>Advanced, System Tunables tab for the following:

net.link.lagg.0.lacp.lacp_strict_mode

With value set to 0. You can configure this in 2.1.5 before upgrading to 2.2, to ensure the same behavior on first boot after the upgrade. It will result in a harmless cosmetic error in the logs on 2.1.5 since the value does not exist in that version.
If you have more than one LAGG interface configured, you will need to enter a tunable for each since that is a per-interface option. So for lagg1, you would add the following.

net.link.lagg.1.lacp.lacp_strict_mode

Also with the value set to 0.

Wednesday, March 26, 2014

MySQL Load Balancing Part 2

In my previous post, I tried to fix the haproxy flaws by swapping it with Zen Load Balancer. Here's the thing though: Zen Load Balancer introduces a flaw of its own; It uses pen for TCP load balancing, which can increase the CPU load to stupid levels. On a relatively mild benchmark that I performed on my MySQL cluster, I witnessed pen's process shoot to 60% CPU usage. Not good.

So what do we do? Well, if we use the best open-source solution for MySQL clustering, Percona XtraDB cluster, the answer is pretty simple: we keep Zen Load Balancer and all its goodies but just for our MySQL farm, we revert to haproxy and some nifty tools provided to us by Percona and we're set! Best of both worlds!

Here's what we do:

Log in to your MySQL console and create a user "clustercheckuser" with the following credentials:
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 156322
Server version: 5.6.15-56-log Percona XtraDB Cluster (GPL), Release 25.4, Revision 731, wsrep_25.4.r4043

Copyright (c) 2009-2013 Percona LLC and/or its affiliates
Copyright (c) 2000, 2013, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql> grant process on *.* to 'clustercheckuser'@'localhost' identified by 'clustercheckpassword!';
Query OK, 0 rows affected (0.01 sec)

mysql> grant process on *.* to 'clustercheckuser'@'127.0.0.1' identified by 'clustercheckpassword!';
Query OK, 0 rows affected (0.00 sec)

mysql> flush privileges;
Query OK, 0 rows affected (0.00 sec)

Now, Percona gives us two tools to work with: clustercheck and pyclustercheck (they do exactly the same thing, but pyclustercheck is written in python and does not require the use of xinetd). What it does is it sends an HTTP 200 response in case the cluster is up and running and an HTTP 503 in case there is something wrong with it. Goodbye haproxy's buggy mysql-check! Nice to see you again tried, tested and great httpchk!

Let's go ahead and configure everything required to run clustercheck. Change the 192.168.108.0/24 to the needs of your network:
[root@mysql1 ~]# vi /etc/xinetd.d/mysqlchk 

# default: on
# description: mysqlchk
service mysqlchk
{
# this is a config for xinetd, place it in /etc/xinetd.d/
        disable = no
        flags           = REUSE
        socket_type     = stream
        port            = 9200
        wait            = no
        user            = nobody
        server          = /usr/bin/clustercheck
        log_on_failure  += USERID
        only_from       = 192.168.108.0/24
        per_source      = UNLIMITED
}

Change port 9200 in /etc/services:
[root@mysql1 ~]# vi /etc/services 
....
sun-as-jpda     9191/udp                # Sun AppSvr JPDA
mysqlchk        9200/tcp                # Percona mysqlchk
#wap-wsp         9200/tcp                # WAP connectionless session service
wap-wsp         9200/udp                # WAP connectionless session service
....

Now, let's go ahead and install xinetd:
[root@mysql1 ~]# yum -y install xinetd.x86_64
[root@mysql1 ~]# chkconfig xinetd on
[root@mysql1 ~]# service xinetd start

Check that it's up and working:
[root@mysql1 ~]# netstat -ntlp 
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address               Foreign Address             State       PID/Program name   
tcp        0      0 127.0.0.1:25                0.0.0.0:*                   LISTEN      3842/master         
tcp        0      0 0.0.0.0:3306                0.0.0.0:*                   LISTEN      3704/mysqld         
tcp        0      0 0.0.0.0:22                  0.0.0.0:*                   LISTEN      2637/sshd           
tcp        0      0 0.0.0.0:4567                0.0.0.0:*                   LISTEN      3704/mysqld         
tcp        0      0 ::1:25                      :::*                        LISTEN      3842/master         
tcp        0      0 :::9200                     :::*                        LISTEN      20371/xinetd        
tcp        0      0 :::22                       :::*                        LISTEN      2637/sshd

As we can see, there is definitely a server listening on port 9200. Great, time to check our iptables rules:
[root@mysql1 ~]# iptables -L -v -n --line-numbers 
Chain INPUT (policy ACCEPT 0 packets, 0 bytes)
num   pkts bytes target     prot opt in     out     source               destination         
1     318K   62M ACCEPT     all  --  *      *       0.0.0.0/0            0.0.0.0/0           state RELATED,ESTABLISHED 
2        2   120 ACCEPT     all  --  lo     *       0.0.0.0/0            0.0.0.0/0           
3        0     0 ACCEPT     tcp  --  *      *       192.168.108.0/24      0.0.0.0/0           tcp dpt:3306 
4        0     0 ACCEPT     tcp  --  *      *       192.168.108.0/24      0.0.0.0/0           tcp dpt:4444 
5        4   240 ACCEPT     tcp  --  *      *       192.168.108.0/24      0.0.0.0/0           tcp dpt:4567 
6        0     0 ACCEPT     tcp  --  *      *       192.168.108.0/24      0.0.0.0/0           tcp dpt:4568 
7        2   120 ACCEPT     tcp  --  *      *       192.168.108.0/24      0.0.0.0/0           state NEW tcp dpt:22 
8        0     0 REJECT     all  --  *      *       0.0.0.0/0            0.0.0.0/0           reject-with icmp-host-prohibited 

Chain FORWARD (policy ACCEPT 0 packets, 0 bytes)
num   pkts bytes target     prot opt in     out     source               destination         
1        0     0 REJECT     all  --  *      *       0.0.0.0/0            0.0.0.0/0           reject-with icmp-host-prohibited 

Chain OUTPUT (policy ACCEPT 0 packets, 0 bytes)
num   pkts bytes target     prot opt in     out     source               destination         
1     271K   28M ACCEPT     all  --  *      *       0.0.0.0/0            0.0.0.0/0  

Right, these are the ports needed for Percona XtraDB cluster to work and SSH. Everything else is rejected. What I need to do is add a rule before my "reject all" rule. So, let connections from my network to port 9200, as the 8th rule in the INPUT chain:
[root@mysql1 ~]# iptables -I INPUT 8 -s 192.168.108.0/24 -p tcp -m tcp --dport 9200 -j ACCEPT
[root@mysql1 ~]# iptables -L -v -n --line-numbers 
Chain INPUT (policy ACCEPT 0 packets, 0 bytes)
num   pkts bytes target     prot opt in     out     source               destination         
1     318K   62M ACCEPT     all  --  *      *       0.0.0.0/0            0.0.0.0/0           state RELATED,ESTABLISHED 
2        2   120 ACCEPT     all  --  lo     *       0.0.0.0/0            0.0.0.0/0           
3        0     0 ACCEPT     tcp  --  *      *       192.168.108.0/24      0.0.0.0/0           tcp dpt:3306 
4        0     0 ACCEPT     tcp  --  *      *       192.168.108.0/24      0.0.0.0/0           tcp dpt:4444 
5        4   240 ACCEPT     tcp  --  *      *       192.168.108.0/24      0.0.0.0/0           tcp dpt:4567 
6        0     0 ACCEPT     tcp  --  *      *       192.168.108.0/24      0.0.0.0/0           tcp dpt:4568 
7        2   120 ACCEPT     tcp  --  *      *       192.168.108.0/24      0.0.0.0/0           state NEW tcp dpt:22 
8        0     0 ACCEPT     tcp  --  *      *       192.168.108.0/24      0.0.0.0/0           tcp dpt:9200 
9        0     0 REJECT     all  --  *      *       0.0.0.0/0            0.0.0.0/0           reject-with icmp-host-prohibited 

Chain FORWARD (policy ACCEPT 0 packets, 0 bytes)
num   pkts bytes target     prot opt in     out     source               destination         
1        0     0 REJECT     all  --  *      *       0.0.0.0/0            0.0.0.0/0           reject-with icmp-host-prohibited 

Chain OUTPUT (policy ACCEPT 0 packets, 0 bytes)
num   pkts bytes target     prot opt in     out     source               destination         
1     271K   28M ACCEPT     all  --  *      *       0.0.0.0/0            0.0.0.0/0  
[root@mysql1 ~]# iptables-save > /etc/sysconfig/iptables

Check that everything is working (obviously substitute 192.168.108.20 with your node's IP address):
[root@mysql1 ~]# nc 192.168.108.20 9200 
HTTP/1.1 200 OK
Content-Type: text/plain
Connection: close
Content-Length: 40

Percona XtraDB Cluster Node is synced.

Nice! Time to head over to our Zen Load Balancer and configure haproxy on it. First, remember to delete or at least stop your MySQL farm (if you have any). Install and configure haproxy:
root@zen-lb:~# apt-get update 
root@zen-lb:~# apt-get install haproxy
root@zen-lb:~# mv /etc/haproxy/haproxy.cfg /etc/haproxy/haproxy.cfg.orig
root@zen-lb:~# vi /etc/haproxy/haproxy.cfg 
global
        log 127.0.0.1   local0
        log 127.0.0.1   local1 notice
        chroot /usr/share/haproxy
        user haproxy
        group haproxy
        daemon
defaults
        log     global
        mode    http
        option  tcplog
        option  dontlognull
        retries 3
        option redispatch
        maxconn 50000
        timeout connect 3500ms
        timeout client 50000ms
        timeout server 50000ms

listen stats :445 #We set up our stats screen, remove block if not wanted or could be integrated below if mode was http. Now we can access the stats at http://LOAD_BALANCER_IP:445/haproxy using username: haproxy and password: haproxy
        mode http
        stats enable
        #stats hide-version
        stats realm Haproxy\ Statistics
        stats uri /haproxy
        stats auth haproxy:haproxy_password

listen Percona_xtradb_cluster_read 192.168.104.10:3306
       balance roundrobin # Typical roundrobin method
       mode tcp #In this mode, the service relays TCP connections as soon as they're established, towards one or several servers. No processing is done on the stream. Two other options are: http and health
       option tcpka #Enable TCP keep-alives on both the client and server sides. This makes it possible to prevent long sessions from expiring on external layer 4 components such as firewalls and load-balancers.
       option httpchk #When option httpchk is specified, a complete HTTP request is sent once the TCP connection is established, and responses 2xx and 3xx are considered valid, while all other ones indicate a server failure, including the lack of any response. 
       server MySQL1 192.168.108.20:3306 check port 9200 inter 5000 downinter 30000 rise 5 fall 1
       server MySQL2 192.168.108.30:3306 check port 9200 inter 5000 downinter 30000 rise 5 fall 1
       server MySQL3 192.168.108.40:3306 check port 9200 inter 5000 downinter 30000 rise 5 fall 1
       server MySQL4 192.168.108.50:3306 check port 9200 inter 5000 downinter 30000 rise 5 fall 1
       server MySQL5 192.168.108.60:3306 check port 9200 inter 5000 downinter 30000 rise 5 fall 1
       server MySQL6 192.168.108.70:3306 check port 9200 inter 5000 downinter 30000 rise 5 fall 1
       server MySQL7 192.168.108.80:3306 check port 9200 inter 5000 downinter 30000 rise 5 fall 1
       server MySQL8 192.168.108.90:3306 check port 9200 inter 5000 downinter 30000 rise 5 fall 1
       server MySQL9 192.168.108.100:3306 check port 9200 inter 5000 downinter 30000 rise 5 fall 1

Here, I have configured the farm to listen to 192.168.1.104.10, port 3306 and my Percona XtraDB cluster has 9 nodes:192.168.108.20, 192.168.108.30, 192.168.108.40, 192.168.108.50, 192.168.108.60, 192.168.108.70, 192.168.108.80, 192.168.108.90, and 192.168.108.100. The maximum connections are 50,000. The connection timeout is 3.5s, while the response timeout is 50s. As before, you need to change these settings to the needs of your network. Set the response timeout too soon and you'll get false positives, resulting to servers getting shut off and connections between a client and a working server cut off, which results to unhappy clients; Set it too high and your load balancer will be late shutting off traffic to dead servers, resulting your service seeming unavailable to some clients, which results to unhappy clients. Finally, it sends probes to check whether a node is up or down every 5 secs if the node has been marked as 'up' (inter 5000), every 30 secs if it has marked as 'down' (downinter 30000), while it while mark a node that has been marked as 'down' only after 5 successful probes (rise 5) but will mark a node that has been marked as 'up' after a single unsuccessful probe (fall 1).

You will also want to change the stats screen variables. Some people remove the "listen stats" section altogether, but if you decide to keep it, you'll definitely want to change the password, which I have set to "haproxy_password", perhaps the user, which I have set to "haproxy" and maybe also the URI and the port (I have set it to http://192.168.104.10:445/haproxy).

We'll also change our haproxy memory usage settings. Change this according to your system's resources, I've set mine to use 2 gigs of RAM:
root@zen-lb:~# vi /etc/default/haproxy 
# Set ENABLED to 1 if you want the init script to start haproxy.
ENABLED=1
# Add extra flags here.
EXTRAOPTS="-de -m 2048"

Almost done. Time to create the necessary directories, make sure they have the correct permissions, start haproxy and arrange so that zen load balancer automatically starts and stops it in case of restarts, cluster failovers etc:
root@zen-lb:~# mkdir /usr/share/haproxy
root@zen-lb:~# chown haproxy:haproxy /usr/share/haproxy/
root@zen-lb:~# chown haproxy:haproxy /etc/haproxy/haproxy.cfg
root@zen-lb:~# chmod 640 /etc/haproxy/haproxy.cfg
root@zen-lb:~# service haproxy start
root@zen-lb:~# update-rc.d haproxy defaults
root@zen-lb:~# vi /usr/local/zenloadbalancer/config/zlb-start
#make your own script in your favorite language, it will be called
#at the end of the procedure /etc/init.d/zenloadbalacer start
#and replicated to the other node if zen cluster is running.
service haproxy restart
root@zen-lb:~# vi /usr/local/zenloadbalancer/config/zlb-stop
#make your own script in your favorite language, it will be called
#at the end of the procedure /etc/init.d/zenloadbalacer start
#and replicated to the other node if zen cluster is running.
service haproxy stop

Remember when I mentioned about my relatively mild benchmark and that I witnessed pen's process shoot to 60% CPU usage? Care to guess what haproxy's usage is now under the same conditions? 20%. Great stuff.

Tuesday, March 18, 2014

MySQL Load Balancing Part 1

MySQL is one of the most popular databases out there. Unfortunately, when you need to have a service that is highly available, most people just use google and are happy to do a copy/paste haproxy configuration. This is bad though. Really bad.

See, haproxy has its flaws.

First of all, it does not support SSL/HTTPS. At the time of writing, the development version offers some experimental support, which means no one would use that in a production environment.

Second flaw, and more related to what we want to do: if you want to use it to check a MySQL cluster, you'll need to create a user 'haproxy' able to log in from your load balancer IP without a password. Ouch. Which brings us to flaw number three:

To balance this, haproxy needs only usage privileges (which really means no privileges at all), so it can only check if it can connect to your MySQL server and nothing else. That means that there are a number of cases where your cluster will be down, but that will go unnoticed.

Fourth flaw: option mysql-check user haproxy (the default method of checking MySQL node availability in haproxy) is buggy.

Fifth flaw: Its stats screen works but it's good if you want to see if a server is up, down, or in a transitional state but that's about it.

Sixth flaw: It doesn't have any built-in way to perform clustering with another haproxy, so you have to use corosync/pacemaker to achieve this, something that at the time of writing -and in my own experience- is buggy.

So what can we do? Well, we can use my favorite load balancer, which just so happens to be free and open-source; Zen Load Balancer!

What is great about the Zen Load Balancer is that not only can it do straightforward TCP checks to the service we need, but if there is a Nagios plugin for it, we can use that to check its health instead!

So, let's go ahead and install our MySQL libs first, our Nagios plugin needs them:
root@zen-lb:~# apt-get install libmysqlclient18

Now, install the check_mysql Nagios plugin:

root@zen-lb:~# apt get update
root@zen-lb:~# apt-get install nagios-plugins-standard
root@zen-lb~# cp /usr/lib/nagios/plugins/check_mysql /usr/local/zenloadbalancer/app/libexec/.
root@zen-lb~# chmod 755 /usr/local/zenloadbalancer/app/libexec/check_mysql 

Cleanup:

root@zen-lb1~# apt-get remove nagios nagios-plugins-standard
root@zen-lb1~# apt-get autoremove

Now, provided that you have created a user called "zen" with password "zenpassword" in your MySQL  that has access to a schema called "mydatabase", these are the steps you need to take to create your MySQL server farm on Zen Load Balancer:
a) Go to Manage->Farms and choose "Add new Farm".
b) Choose a name for your farm and Profile: TCP
c) Select the network interface/IP you'd like your farm to listen on and its port (usually 3306)
d) Go ahead and select "Edit"

Go ahead and edit the farm's parameters to suit the needs of your network. The settings and what they do are really straight forward.
As always you need to remember: Set the response timeout too soon and you'll get false positives, resulting to servers getting shut off and connections between a client and a working server cut off, which results to unhappy clients; Set it too high and your load balancer will be late shutting off traffic to dead servers, resulting your service seeming unavailable to some clients, which results to unhappy clients.
Now here's where we tell Zen to check our MySQL database using Nagios:

- Check "Use FarmGuardian to check Backend Servers".
- Populate the "Check every secs" box to how often you want Zen to query your MySQL.
- Fill in the "Command to check" box with:
check_mysql -H HOST -P 3306 --user=zen --password=zenpassword -d mydatabase
Of course, you need to change the user, password and database to whatever you have already set up."HOST" is a variable which means that this is the IP of your real servers Zen needs to check once you have defined them. Respectively, we could have used "-P PORT" instead of "-P 3306".
- Check "Enable farmguardian logs" if you want to have more control and be able to debug (logs will be at /usr/local/zenloadbalancer/logs/).

After that, go to the "Edit real IP servers configuration" section and add your real servers to your server farm. 

Monday, February 24, 2014

Dell PowerEdge, pfSense and RAID status monitoring using ports/pkg

The Dell PowerEdge is a thing of beauty. Not only can you find a second-hand one for next to nothing, but they are great for acting as load balancers, firewalls and routers. With the the right tools and enough expertise you can build systems that the big players sell for tens, or sometimes hundreds of thousands of dollars. All you need to have is a second-hand Dell PowerEdge, a few Intel NICs, the right open-source software and you're set.

So I have this Dell PowerEdge box that acts as a firewall, having pfSense 2.1 installed on it. How do we go about monitoring its RAID status? The answer is pretty straightforward; Install megacli on a virtual machine, create a package out of it, transfer it to our pfSense box and install.

a) Check your FreeBSD version on your webconfigurator dashboard and depending on whether you have the 32 or the 64-bit version of pfSense installed, download the right .iso from  http://ftp.freebsd.org/pub/FreeBSD/. For instance, I have pfSense 2.1, which is based on FreeBSD 8.1 and is 64-bit. Therefore, I downloaded this:  http://ftp.freebsd.org/pub/FreeBSD/ISO-IMAGES-amd64/8.1/FreeBSD-8.1-RELEASE-amd64-livefs.iso.
b) Create a virtual machine on your local PC using your favourite virtualization solution. Make sure it's 32-bit if your pfSense is 32-bit or 64-bit if your pfSense is 64-bit.
c) Install using the downloaded .iso.
d) Install all available packages, we might need to use this method again in the future for other packages.
e) If FreeBSD installation throws an error about not being able to find your distribution in the available mirrors, that would be because it's an older one. Just choose ftp://ftp-archive.freebsd.org/pub/FreeBSD-Archive/old-releases/i386/ if you have a 32-bit system or ftp://ftp-archive.freebsd.org/pub/FreeBSD-Archive/old-releases/amd64/ if you have a 64-bit system as the URL of the FreeBSD distibution on the remote ftp site.
f) Time for the actual work:
freebsd# freebsd-update fetch
freebsd# freebsd-update install
freebsd# portsnap fetch
freebsd# portsnap extract
freebsd# portsnap fetch update
freebsd# cd /usr/ports/sysutils/megacli/
freebsd# make package-recursive
Creating bzip'd tar ball in '/usr/ports/sysutils/megacli/work/megacli-8.07.07.tbz'

g) Go to your pfSense Webconfigurator, System, Advanced, Admin Access and make sure "Enable Secure Shell" is checked.
h) Upload the created bzip'd tar ball to your pfSense using SFTP or SCP.
i) SSH to your pfSense box and choose 8 (shell).
j) I have uploaded the packed to the /root directory so:
[2.1-RELEASE][root@pfsense.mynetwork.net]/root(1): pkg_add /root/megacli-8.07.07.tbz
[2.1-RELEASE][root@pfsense.mynetwork.net]/root(2): pkg_info
bsdinstaller-2.0.2013.0911 BSD Installer mega-package
gettext-0.18.3      GNU gettext package
libiconv-1.14_1     A character set conversion library
megacli-8.07.07     SAS MegaRAID FreeBSD MegaCLI

k) Now, you can go to your pfSense webconfigurator: System, Packages, Available Packages and install "mailreport".
l) Then go to System, Advanced, Notifications and configure your mail server SMTP settings.
m) Finally, go to Status, Email Reports and create a new mail report.
n) Configure it, save it and then edit it so that it executes the /usr/local/sbin/MegaCli -LDInfo -Lall -aALL and the /usr/local/sbin/MegaCli -PDList -aALL commands.

Done! Now every day you'll get a status report for your firewall's disks!

Note 1: This method uses the deprecated ports/pkg method. I will describe the method required using the newer pkgng system in a future post.
Note 2: Megacli does not work with the older SAS 6/iR cards, you will need the mptutil package. The method is the same.

Friday, February 21, 2014

Monitor your HP Smart Array RAID controller from XenServer 6.2

I have a few HP Proliant Servers that I use as XenServer hosts. The problem is that I want to know the health of my disks and my RAID arrays, and of course I can't get those from the XenServer guests.

So here's what we do:

Just google for knowledge base search hp support center (HP changes their links too often for me to give you the current link, it will probably change in a few months), follow that link and then choose "Search HP Support Center" in Knowledge Base.

When you're there search for HP Array Configuration Utility CLI for Linux and select the most recent 32-bit version (XenServer Dom0 is 32-bit).

Get its URL and download it from your XenServer or download it to your PC and sftp it to your Xenserver. The most recent version at the time of writing was this:

http://ftp.hp.com/pub/softlib2/software1/pubsw-linux/p414707558/v71530/hpacucli-9.10-22.0.i386.rpm

Time for the fun part. Let's log into our XenServer:

[root@server ~]# wget http://ftp.hp.com/pub/softlib2/software1/pubsw-linux/p414707558/v71530/hpacucli-9.10-22.0.i386.rpm


From the README.txt:

Description
-----------

  The Array Configuration Utility CLI is a commandline-based disk
   configuration program for Smart Array Controllers and
   RAID Array Controllers.
 
* All other product names mentioned herein may be trademarks of their
  respective companies.

Supported Controllers

  Smart Array products:
     Smart Array 5312 Controller
     Smart Array 5302 Controller
     Smart Array 5304 Controller
     Smart Array 532 Controller
     Smart Array 5i Controller 
     Smart Array 641 Controller
     Smart Array 642 Controller
     Smart Array 6400 Controller
     Smart Array 6400 EM Controller
     Smart Array 6i Controller
     Smart Array P600 Controller
     Smart Array P400 Controller
     Smart Array P400i Controller
     Smart Array E200 Controller
     Smart Array E200i Controller
     Smart Array P800 Controller
     Smart Array E500 Controller
     Smart Array P700m Controller
     Smart Array P410i Controller
     Smart Array P411 Controller
     Smart Array P212 Controller
     Smart Array P712m Controller
     Smart Array B110i SATA RAID
     Smart Array P812 Controller
     Smart Array P220i Controller
     Smart Array P222 Controller
     Smart Array P420 Controller
     Smart Array P420i Controller
     Smart Array P421 Controller
     Smart Array P822 Controller
     Dynamic Smart Array B320i RAID
     Dynamic Smart Array B120i RAID

  MSA products:
     MSA500 Controller
     MSA500 G2 Controller
     MSA1000 Controller    
     MSA1500 CS Controller
     MSA20 Controller  

[root@server ~]# yum install -y --nogpgcheck hpacucli-9.10-22.0.i386.rpm
[root@server ~]# hpacucli controller slot=1 physicaldrive all show
Smart Array P410 in Slot 1

   array A

      physicaldrive 1I:1:1 (port 1I:box 1:bay 1, SATA, 1 TB, OK)
      physicaldrive 1I:1:2 (port 1I:box 1:bay 2, SATA, 1 TB, OK)

   array B

      physicaldrive 1I:1:3 (port 1I:box 1:bay 3, SATA, 1 TB, OK)
      physicaldrive 1I:1:4 (port 1I:box 1:bay 4, SATA, 1 TB, OK)

   array C

      physicaldrive 1I:1:5 (port 1I:box 1:bay 5, SATA, 1 TB, OK)
      physicaldrive 1I:1:6 (port 1I:box 1:bay 6, SATA, 1 TB, OK)

   array D

      physicaldrive 1I:1:7 (port 1I:box 1:bay 7, SATA, 1 TB, OK)
      physicaldrive 1I:1:8 (port 1I:box 1:bay 8, SATA, 1 TB, OK)

   array E

      physicaldrive 1I:1:9 (port 1I:box 1:bay 9, SATA, 1 TB, OK)
      physicaldrive 1I:1:10 (port 1I:box 1:bay 10, SATA, 1 TB, OK)

And a few helpful commands:
[root@server ~]# hpacucli help
CLI Syntax
   A typical ACU CLI command line consists of three parts: a target device, 
   a command, and a parameter with values if necessary. Using angle brackets to
   denote a required variable and plain brackets to denote an optional 
   variable, the structure of a typical ACU CLI command line is as follows:

      <target> <command> [parameter=value]

   <target> is of format:
      [controller all|slot=#|wwn=#|chassisname="AAA"|
                  serialnumber=#|chassisserialnumber=#|ctrlpath=#:# ]
      [array all|<id>]
      [physicaldrive all|allunassigned|[#:]#:#|[#:]#:#-[#:]#:#]
      [ssdphysicaldrive all|allunassigned|[#:]#:#|[#:]#:#-[#:]#:#]
      [logicaldrive all|#]
      [enclosure all|#:#|serialnumber=#|chassisname=#]
      [licensekey all|<key>]
      [ssdinfo]
      Note 1: The #:#:# syntax is only needed for systems that
              specify port:box:bay. Other physical drive targeting
              schemes are box:bay and port:id.
      Note 2: The chassisserialnumber is known in ACU as the
              RAID Array Serial Number. The chassisname is known
              in ACU as the RAID Array ID.
      Note 3: ctrlpath=#:# maps to "smart enclosure hooked up to 
              host bus adapter slot:host bus adapter port"

   Example targets:
      controller slot=5
      controller chassisname="Lab C"
      controller serialnumber=P21DA2322S
      controller chassisserialnumber=9J3CJN71XDCH
      controller wwn=500308B300701011
      controller slot=7 array A
      controller slot=5 logicaldrive 5
      controller slot=5 physicaldrive 1:5
      controller slot=5 physicaldrive 1E:2:3
      controller slot=5 ssdphysicaldrive all
      controller slot=5 enclosure 4E:1 show
      controller slot=5 licensekey XXXXX-XXXXX-XXXXX-XXXXX-XXXXX

For detailed command information type any of the following: 
   help add
   help create
   help delete
   help diag
   help modify
   help remove
   help shorthand
   help show
   help target
   help rescan
   help version

Help also accepts commonly used CLI parameters and ACU keywords. Adding
additional keywords will further filter the help output. Examples: 
   help ssp        (shows all ssp help including show and modify commands)
   help ssp modify (restricts ssp help to only modify commands)
   help migrate
   help expand   
   help extend   
   help <keyword> <keyword> ... <keyword>

   Please note that beginning with ACU/ACUCLI version 8.55, the term 
   "stripe size" has been replaced by "strip size."  This is a change
   of labeling and does not signify a change in functionality.  When 
   distributing data across multiple physical drives (striping) the
   "strip size" is the amount of data that is written to each physical
   drive.  The "full stripe size" refers to the combined size of all
   the strips across all physical drives,  excluding parity-only drives.

Done!