Monday, December 15, 2014

Windows System error 1219, multiple connections to a server or shared resource by the same user

Here's a funny one I ran into earlier.

So I try to mount a SAMBA share on a Windows system. I get this error:



Wait, are you telling me that I am restricted to having only one share subdirectory connected per PC? What happens if I need two? Now that's some crappy engineering right there Batman!

Oh, I know what this is. It's probably one of those Windows "features" and if I do the same thing from the command line, it'll just work, right? Right? Wrong.

F:\Documents and Settings\user>net use Q: \\192.168.0.254\DIR2 /USER:shareuser /PERSISTENT:NO
The password is invalid for \\192.168.0.254\DIR2.

Enter the password for 'shareuser' to connect to '192.168.0.254':
Multiple connections to a server or shared resource by the same user, using more than one user name, are not allowed. Disconnect all previous connections to the server or shared resource and try again..

Through some digging I managed to find out that this is by design. Windows will connect to a share once; if you try to connect again, it will block you. If you want to connect to a different subdirectory of your SAMBA server, you will need to disconnect from the one you are currently connected to.

Fret not though, for this is Windows; home of the crappiest, nastiest pieces of engineering ever came to existence.

Workaround? Just edit your %WINDIR%/system32/drivers/etc/hosts file and add some more entries that correspond to your share's IP address. For instance:

192.168.0.254 foo
192.168.0.254 bar
192.168.0.254 foobar
192.168.0.254 fubar

And now I can connect to my share using my new aliases:

F:\Documents and Settings\user>net use Q: \\fubar\DIR2 /USER:shareuser /PERSISTENT:NO
The password is invalid for \\fubar\DIR2.

Enter the password for 'shareuser' to connect to 'fubar':
The command completed successfully.

Oh Windows how I hate thee with all my passion.

Friday, December 5, 2014

Recover BMC administrator password on HP Proliant DL180 G6

This is from my notes. I don't have an HP Proliant DL180 G6 server in front of me, but I trust my notes and I think this description should be as accurate as it can. You'll need a PC with an internet connection, a USB drive and a jumper connector. You might also need a Linux Live CD (the instructions here assume that you have a RHEL-based Live CD).

a) Disconnect AC power from the system
b) Insert jumper on connector J27, D group pins 
c) Update BMC firmware using HP ROMPaq Firmware Upgrade for HP ProLiant G6 Lights-Out 100 Remote Management (For USB Key-Media).
    Detailed information is available at the following Web page: http://h20000.www2.hp.com/bizsupport/TechSupport/SoftwareDescription.jsp?lang=en&cc=us&prodTypeId=15351&prodSeriesId=3884343&swItem=MTX-bd27c5aa4f134285aa4825e143&prodNameId=3884344&swEnvOID=1005&swLang=13&taskId=135&mode=4&idx=1
d) Copy the impitool.exe binary from http://www.intel.com/design/servers/ipmi/ipmi_tool.htm onto the USB drive
e) Remove AC power and remove the jumper from J27, D group pins.
f) After BMC recovery is complete, boot from the USB key again and do:
ipmitool.exe 20 18 47 03 02 61 64 6d 69 6e 00 00 00 00 00 00 00 00 00 00 00
g) Remove USB drive and shutdown.
h) If the admininstrator user was "admin" you can now log in as admin/admin. If you still can't log in, proceed as follows (the following instructions are for a RHEL Live CD):
  1. Get a Linux Live CD
  2. yum update
  3. modprobe ipmi_msghandler
  4. modprobe ipmi_devintf
  5. modprobe ipmi_si
  6. yum search ipmi
  7. yum -y install ipmitool
  8. ipmitool user list
  9. note down the name of user id #3 and log in with that as a username and password admin.

Wednesday, December 3, 2014

PCI-DSS, nginx access logs, and credit card masking

PCI-DSS states that the card number must not be displayed in full—no more than the first six and the last four digits may be visible on a screen, a receipt, or on any other media used by the organization.

Sometimes your web server might leave access logs that contain this sort of information though. Here's an nginx access log snippet:

1.2.3.4 - - [02/Dec/2014:14:58:54 +0200] "GET /ccsite/?card_number=5100+1500+0000+0001&amount=10000&cvv=123&holder_name=N.Ame&expiration_month=1&expiration_year=2015&description=Something&_method=POST HTTP/1.1" 200 395 "https://partnersite.com/apps/ccapp?dothis.js" "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:33.0) Gecko/20100101 Firefox/33.0"
1.2.3.4 - - [02/Dec/2014:16:07:46 +0200] "password=12345678&username=my_username" 200 416 "-" "Dalvik/1.6.0 (Linux; U; Android 4.3; C1905 Build/15.4.A.1.9)" 

Ouch, we shouldn't be writing stuff like this to our logs!

So what so we do?

Well, in this example I'm going to install the ngx_http_perl_module and use regular expressions to mask out the credit card and CVV.

First of all, let's see if we have Perl installed. Also, in order for Perl to recompile the modified modules during reconfiguration, it should be built with the -Dusemultiplicity=yes or -Dusethreads=yes parameters. Also, to make Perl leak less memory at run time, it should be built with the -Dusemymalloc=no parameter.

Let's check if we're good to go:

[root@webserver ~]# perl -V:usemultiplicity -V:usemymalloc
usemultiplicity='define';
usemymalloc='n';

Let's install some prerequisites to embed our Perl:

[root@webserver ~]# yum install perl-CPAN perl-ExtUtils-CBuilder perl-ExtUtils-MakeMaker perl-ExtUtils-Embed perl-devel 

Great. Now, let's back up our nginx configuration files and recompile it:

[root@webserver ~]# cp -rf /opt/nginx/conf/ /root/nginx_conf_bak

OK, now we should configure nginx as always, but with the addition of --with-http_perl_module. So in my case it is:

[root@webserver nginx-1.6.2]# ./configure --add-module=../naxsi/naxsi-master/naxsi_src/ --prefix=/opt/nginx --error-log-path=/var/log/nginx/nginx_error.log --http-log-path=/var/log/nginx/nginx_access.log --user=www-data --group=www-data --with-http_addition_module --with-http_geoip_module --with-http_gzip_static_module --with-http_stub_status_module --with-http_realip_module --without-mail_pop3_module --without-mail_smtp_module --without-mail_imap_module --without-http_memcached_module --without-http_ssi_module --without-http_uwsgi_module --without-http_scgi_module --with-http_perl_module

If you, like me, like to change the nginx headers and version to spoof their make to something else, say, IIS, then you should also make sure that the #define NGINX_VERSION in /src/core/nginx.h variable consists of numbers and dots only, for example not 3.4.1 (Unix), but 3.4.1, otherwise you'll encounter compilation problems.

Let's proceed to the installation of our new nginx binary that will now support the nginx perl module:

[root@webserver ~]# make
[root@webserver ~]# make install

Let's restore our old config:
[root@webserver ~]# cd /opt/nginx/conf/
[root@webserver ~]# rm -rf *
[root@webserver conf]# cp -rf /root/nginx_conf_bak/* .
[root@webserver conf]# chown -R www-data:www-data *

Time to do some substitutions using regular expressions on our logs. This needs to be in our http block:

[root@webserver ~]# vi nginx.conf
....
http {
perl_set $anonymize_data '
                sub {
                        my $r = shift;
                        my $req =  $r->request_body;
                        $req =~ s/\?card_number\=(?:4[0-9]{12}(?:[0-9]{3})?|5[1-5][0-9]{14}|3[47][0-9]{13}|3(?:0[0-5]|[68][0-9])[0-9]{11}|6(?:011|5[0-9]{2})[0-9]{12}|(?:2131|1800|35\d{3})\d{11})/\?card_number\=XXXX-XXXX-XXXX-XXXX/g;
                        $req =~ s/\&card_number\=(?:4[0-9]{12}(?:[0-9]{3})?|5[1-5][0-9]{14}|3[47][0-9]{13}|3(?:0[0-5]|[68][0-9])[0-9]{11}|6(?:011|5[0-9]{2})[0-9]{12}|(?:2131|1800|35\d{3})\d{11})/\&card_number\=XXXX-XXXX-XXXX-XXXX/g;
                        $req =~ s/\&cvv\=\d\d\d/\?cvv\=XXX/g;
                        $req =~ s/\&cvv\=\d\d\d/\&cvv\=XXX/g;
                        $req =~ s/password\=\w+/password\=XXXXXXXX/g;
                        return $req;
                } ';

Notice that I cover both scenarios; if the card_number arrives first (?card_number); or as a secondary/tertiary variable (&card_number). I do the same for the CVV. Replace "card_number" and "cvv" with whatever variable names your requests come as.

Finally, you will need to replace your "$request" with "$anonymize_data" in your log_format, like so:

log_format main $remote_addr - $remote_user [$time_local] "$anonymize_data" $status $body_bytes_sent "$http_referer" "$http_user_agent";

And now the card number, the CVV and the user's password get masked.

1.2.3.4 - - [02/Dec/2014:14:58:54 +0200] "GET /ccsite/?card_number=XXXX-XXXX-XXXX-XXXX&amount=10000&cvv=XXX&holder_name=N.Ame&expiration_month=1&expiration_year=2015&description=Something&_method=POST HTTP/1.1" 200 395 "https://partnersite.com/apps/ccapp?dothis.js" "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:33.0) Gecko/20100101 Firefox/33.0"
1.2.3.4 - - [02/Dec/2014:16:07:46 +0200] "password=XXXXXXXX&username=my_username" 200 416 "-" "Dalvik/1.6.0 (Linux; U; Android 4.3; C1905 Build/15.4.A.1.9)" 

Friday, November 14, 2014

Free Hypervisors Comparison

I often read debates and opinions on the internet concerning the pros and cons of the free versions of the three most widely-spread hypervisors.

Here's my view on them:


ESXi:


Pros:

  • It's the most popular bare-metal hypervisor, so whatever problem you encounter you can just google for the answer.  
  • Supports a huge number of Guest OSes 

Cons:

  • It's not free; this is not even a clever marketing technique. It's a bait and switch, plain and simple. Let me elaborate here. ESXi is free, which is the hypervisor part. Great. But its management tools aren't. So you can't do absolutely anything with it. If you want to actually install a guest OS, you'll need the vSphere client which costs thousands of dollars, or VMware workstation that you will need to use to connect to the ESXi remotely (a solution which will set you back a few hundred dollars instead). 
  • I have found its command line to be obtuse. I know, eye of the beholder and all, but it would be much easier to learn if it had completion for its commands as well as the command parameters, just like XenServer.


Hyper-V:


Pros:

  • If you're planning to install a Windows OS, look no further. Microsoft has done a great job with the optimization and performance of Windows-based VMs. 
 
Cons:

  • Even though it is dead simple to actually install the hypervisor you can't do anything unless you have set up and configured users, domains, firewall rules, etc. from the command line. If you're not already familiar with PowerShell, this can be a daunting and time consuming task (actually, it is a time-consuming task even if you are familiar with the PowerShell). Luckily, there is a script out there that takes care of most of these tasks, reducing the time taken to actually set up everything. After that, you'll need to install the RSAT tools and Hyper-V tools on your PC. Finally, if you're running windows 8, for some strange reason, Microsoft has also decided that you will need the Pro Edition to manage your Hyper-V server.
  • It does not provide zero-downtime live VM migration  
  • Dynamic Memory is only supported with VMs running Windows Vista SP1 and above, or Windows Server 2008 SP2/2008R2 SP1, 2003 R2 SP2 and 2003 SP2.
  • If you're planning to use Linux, you'll need to install the legacy network adapters, and after you're done with the installation of the Linux Integration Services, you'll need to substitute the network adapters for better ones; somewhat minor irritation, I know. 


XenServer:


Pros:

  • It's a fully-fledged Operating System based on Red-Hat 5, which means you can tweak it and do whatever you want. For instance, you will need to have specialized software to take backups of your VMs if you're running Hyper-V or ESXi. You can if you want with XenServer, but personally I've just written a perl script. And it works great. And guess what, too: my backups are written to a GlusterFS volume. If I had ESXi, I would have to use the insecure and deprecated NFS protocol for that. On XenServer, I just had to install GlusterFS, like any other Linux node.
  • Its command line is unbelievably good. And extremely easy to learn. You can do everything with it.
  • The free version is not a watered-down version of the paid one. Almost anything you can do with the XenServer paid version you can with the XenServer free version, except features like GPU virtualization, in-memory read caching, support and patching (you'll have to patch using the command line if you're using the free version).

Cons:

  • XenServer used to have a great networking management interface called the Distributed Virtual Switch Controller. Unfortunately, this tool became deprecated as soon as the product was decided to become open-source. Of course, since XenServer is a fully-fledged Linux distro, this means that it is trivial to actually solve any network-related issue by switching our networking stack to Linux bridge, and taking it from there. 

Sunday, November 2, 2014

Zen Load Balancer Performance Tests Part 2

All right. Time to upgrade the entire thing as described in the "Zen Load Balancer 3.0.3 Perfomance and Security Customization"series.

In summary:

Zen Load Balancer 3.0.3 Perfomance and Security Customization Part 5 was skipped, no NIC bonding was necessary;
Zen Load Balancer 3.0.3 Perfomance and Security Customization Part 4 was implemented, with the 1GB+ recommended settings and no high-memory usage settings;
Zen Load Balancer 3.0.3 Perfomance and Security Customization Part 3 was skipped, no point of checking for invalid packets or rate limit. This is just a test server;
Zen Load Balancer 3.0.3 Perfomance and Security Customization Part 2 was implemented;
Zen Load Balancer 3.0.3 Perfomance and Security Customization Part 1 was implemented (no filesystem tuning though), although the system's memory is 2GB, which means the only benefit here is the NX protection which won't be necessary since this is a test server (really, this should actually produce some overhead).

Let's go over my Zen Load Balancer's stats again:

Memory: 2048MB
CPU #1: 3.0GHz

I decided to use G-WAN as the backend, mainly because:

a) I hadn't used it before and it was a good opportunity to do so;
b) They brag a ridiculously high performance throughput -you know you have to at least check if it's partly true, and if it is, all the better as I'll stress Zen even more;
c) They have an enormous amount of haters; from other webserver fanbois who refuse to accept that a different webserver than the one they use might be better, to company plants that ticket non-existant vulnerabilities to try to prove that G-WAN isn't as good as its publishers claim. This alone can make me install software just out of spite.

So G-WAN it is, then!

The webpage that I will serve will be generated by this little C++ servlet here:
root@gwan:/opt/gwan_linux64-bit/0.0.0.0_8080/#0.0.0.0/csp# cat bench.cpp
// ============================================================================
// C++ servlet example to benchmark Zen and G-WAN Web Application Server   (http://gwan.ch/)
// ----------------------------------------------------------------------------
// bench.cpp: Concatenate a bunch of "Hello world!" until it gets to 400KB (encodings included).
//
// This code is based on the hello.cpp example by Thomas Meitz (gwan at jproxx dot com)
// ============================================================================
// imported functions:
//   get_reply(): get a pointer on the 'reply' dynamic buffer from the server
//   xbuf_cat(): like strcat(), but it works in the specified dynamic buffer
// ----------------------------------------------------------------------------
#include "gwan.h" // G-WAN definitions
#include <string>

using namespace std;

class Hello
{
public:
   void whatsup(xbuf_t* reply, string& str)
   {
      xbuf_cat(reply, (char*)str.c_str());
   }
};
// ----------------------------------------------------------------------------
int main(int argc, char *argv[])
{
   string h("Hello World! ");
   do {
   h+=h;
   } while (h.length() < 240000);
   Hello hello;
   hello.whatsup(get_reply(argv), h);

   return 200; // return an HTTP code (200:'OK')
}
// ============================================================================
// End of Source Code
// ============================================================================


These were the farm settings:



Tests with Zen Load Balancer, customized as per the recommendations in my blog

The test will be the same as before; we'll send 10, 50 and 100 requests concurrently and 10000 in total, all trying to access the output of our script.

The tests will be run 5 times each. After that, we'll find the median value according to that point in time so as to eliminate any statistical aberrations.


D:\apache\Apache24\bin>abs -c10 -n10000 -f TLS1 https://192.168.0.99:443/?bench.cpp
This is ApacheBench, Version 2.3 <$Revision: 1554214 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking 192.168.0.99 (be patient)
Completed 1000 requests
Completed 2000 requests
Completed 3000 requests
Completed 4000 requests
Completed 5000 requests
Completed 6000 requests
Completed 7000 requests
Completed 8000 requests
Completed 9000 requests
Completed 10000 requests
Finished 10000 requests


Server Software:        G-WAN
Server Hostname:        192.168.0.99
Server Port:            443
SSL/TLS Protocol:       TLSv1,RC4-SHA,1024,128

Document Path:          /?bench.cpp
Document Length:        425984 bytes

Concurrency Level:      10
Time taken for tests:   110.865 seconds
Complete requests:      10000
Failed requests:        0
Total transferred:      4262600020 bytes
HTML transferred:       4259840000 bytes
Requests per second:    90.20 [#/sec] (mean)
Time per request:       110.865 [ms] (mean)
Time per request:       11.086 [ms] (mean, across all concurrent requests)
Transfer rate:          37547.54 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        4   13   6.9     11     106
Processing:    45   98  24.3     97     756
Waiting:        1   11   8.0     10     103
Total:         57  111  25.0    109     775

Percentage of the requests served within a certain time (ms)
  50%    109
  66%    114
  75%    117
  80%    120
  90%    127
  95%    134
  98%    146
  99%    161
 100%    775 (longest request)

Whoa, got to say, this looks extremely promising. Let's do this four more times, find the median values and create a chart out of it.

This is our Zen's CPU usage for 10 concurrent connections:


This is not bad at all. No excessive CPU usage, and good performance as well for an HTTPS farm; apachebench reported around 90 requests per second.

For 50 concurrent connections:

Insane. Once again, no excessive CPU usage, and good performance for an HTTPS farm; apachebench reported almost 89 requests per second!

For 100 concurrent connections:

And this seals it. With our recommended performance upgrades, Zen is both faster and uses a lot less CPU. Just compare this with our previous results! For the record, apachebench reported around 80 requests per second.

Wednesday, October 8, 2014

Zen Load Balancer Performance Tests Part 1

This is my Zen Load Balancer's stats:

Memory: 2048MB
CPU #1: 3.0GHz

I decided to use G-WAN as the backend, mainly because:

a) I hadn't used it before and it was a good opportunity to do so;
b) They brag a ridiculously high performance throughput -you know you have to at least check if it's partly true, and if it is, all the better as I'll stress Zen even more;
c) They have an enormous amount of haters; from other webserver fanbois who refuse to accept that a different webserver than the one they use might be better, to company plants that ticket non-existant vulnerabilities to try to prove that G-WAN isn't as good as its publishers claim. This alone can make me install software just out of spite.

So G-WAN it is, then!

The webpage that I will serve will be generated by this little C++ servlet here:
root@gwan:/opt/gwan_linux64-bit/0.0.0.0_8080/#0.0.0.0/csp# cat bench.cpp
// ============================================================================
// C++ servlet example to benchmark Zen and G-WAN Web Application Server   (http://gwan.ch/)
// ----------------------------------------------------------------------------
// bench.cpp: Concatenate a bunch of "Hello world!" until it gets to 400KB (encodings included).
//
// This code is based on the hello.cpp example by Thomas Meitz (gwan at jproxx dot com)
// ============================================================================
// imported functions:
//   get_reply(): get a pointer on the 'reply' dynamic buffer from the server
//   xbuf_cat(): like strcat(), but it works in the specified dynamic buffer
// ----------------------------------------------------------------------------
#include "gwan.h" // G-WAN definitions
#include <string>

using namespace std;

class Hello
{
public:
   void whatsup(xbuf_t* reply, string& str)
   {
      xbuf_cat(reply, (char*)str.c_str());
   }
};
// ----------------------------------------------------------------------------
int main(int argc, char *argv[])
{
   string h("Hello World! ");
   do {
   h+=h;
   } while (h.length() < 240000);
   Hello hello;
   hello.whatsup(get_reply(argv), h);

   return 200; // return an HTTP code (200:'OK')
}
// ============================================================================
// End of Source Code
// ============================================================================


These were the farm settings:



Tests with stock Zen Load Balancer 3.0.5 community edition

All right, let's begin with stock Zen. No updates and no extra software, except what we'll be using to collect our statistical data (nmon) and any packages needed to install vmware tools.
Also, ipv4 forwarding was set to 1 and I set up NAT to serve my backend, which was in a different network: iptables -t nat -A POSTROUTING -o eth0 -j MASQUERADE.

The test will be a little insane, we'll send 10, 50 and 100 requests concurrently and 10000 in total, all trying to access the output of our script.

The tests will be run 5 times each. After that, we'll find the median value according to that point in time so as to eliminate any statistical aberrations.

But first, let's do a few simple HTTP tests to see the difference in performance and resource usage:
D:\apache\Apache24\bin>ab -c10 -n10000 http://192.168.0.99:80/?bench.cpp
This is ApacheBench, Version 2.3 <$Revision: 1554214 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking 192.168.0.99 (be patient)
Completed 1000 requests
Completed 2000 requests
Completed 3000 requests
Completed 4000 requests
Completed 5000 requests
Completed 6000 requests
Completed 7000 requests
Completed 8000 requests
Completed 9000 requests
Completed 10000 requests
Finished 10000 requests


Server Software:        G-WAN
Server Hostname:        192.168.0.99
Server Port:            80

Document Path:          /?bench.cpp
Document Length:        425984 bytes

Concurrency Level:      10
Time taken for tests:   112.649 seconds
Complete requests:      10000
Failed requests:        0
Total transferred:      4262600000 bytes
HTML transferred:       4259840000 bytes
Requests per second:    88.77 [#/sec] (mean)
Time per request:       112.649 [ms] (mean)
Time per request:       11.265 [ms] (mean, across all concurrent requests)
Transfer rate:          36952.96 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    4  60.4      0    3016
Processing:    16  108 184.0     94    3126
Waiting:        0   23  78.1     16    3047
Total:         16  112 193.4     94    3126

Percentage of the requests served within a certain time (ms)
  50%     94
  66%    109
  75%    125
  80%    125
  90%    156
  95%    172
  98%    203
  99%    266
 100%   3126 (longest request)

That's pretty sweet performance right there! Our webserver offers dynamic content of a 420KB page at almost 89 requests per second, without any caching mechanism to aid. Impressive. I'd go as far as to say that Zen is a bottleneck here.

So let's rinse and repeat 4 more times, calculate the median and create a graph.

This is our Zen's CPU usage for 10 concurrent connections:


For the record, apachebench reported around 84 requests per second on average.

For 50 concurrent connections:



For the record, apachebench reported around 78 requests per second on average.

Let's finish this with 100 concurrent connections:


For the record, apachebench reported around 75 requests per second on average.

As you can see, nothing dramatic, and CPU usage rarely exceeds 45%. Time to see what happens if we use HTTPS instead.

D:\apache\Apache24\bin>abs -c10 -n10000 https://192.168.0.99:443/?bench.cpp
This is ApacheBench, Version 2.3 <$Revision: 1554214 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking 192.168.0.99 (be patient)
Completed 1000 requests
Completed 2000 requests
Completed 3000 requests
Completed 4000 requests
Completed 5000 requests
Completed 6000 requests
Completed 7000 requests
Completed 8000 requests
Completed 9000 requests
Completed 10000 requests
Finished 10000 requests


Server Software:        G-WAN
Server Hostname:        192.168.0.99
Server Port:            443
SSL/TLS Protocol:       TLSv1,RC4-SHA,1024,128

Document Path:          /?bench.cpp
Document Length:        425984 bytes

Concurrency Level:      10
Time taken for tests:   129.153 seconds
Complete requests:      10000
Failed requests:        0
Total transferred:      4262600018 bytes
HTML transferred:       4259840000 bytes
Requests per second:    77.43 [#/sec] (mean)
Time per request:       129.153 [ms] (mean)
Time per request:       12.915 [ms] (mean, across all concurrent requests)
Transfer rate:          32230.65 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0   37 165.0     31    3079
Processing:    16   92 212.3     78    3157
Waiting:        0   17  86.0     16    3049
Total:         16  129 268.0    109    3172

Percentage of the requests served within a certain time (ms)
  50%    109
  66%    109
  75%    109
  80%    125
  90%    125
  95%    141
  98%    141
  99%    375
 100%   3172 (longest request)

Wow, our performance is close to what apachebench reported for 50 concurrent connections on plain HTTP requests!

So let's rinse and repeat 4 more times, calculate the median and create a graph.

This is our Zen's CPU usage for 10 concurrent connections:



The CPU usage cost is staggering; we rarely see it drop below 80%. For the record, apachebench reported almost 85 requests per second on average.

For 50 concurrent connections:


Look at that CPU usage! Most of the time is at 100%! For the record, apachebench reported almost 90 requests per second on average.

For 100 concurrent connections:


Considering the previous results, honestly I expected even worse! For the record, apachebench reported around 74 requests per second on average.

Tests with stock Zen Load Balancer 3.0.5 community edition + PCRE + TCMalloc + HOARD

If PCRE, tcmalloc (from the Google perftools package) and/or Hoard are available Pound will link against them. This will provide a significant performance boost and is highly recommended.

Let's see the perfomance boost for ourselves then.

root@zen-lb:~# apt-get install make cmake g++ libpcrecpp0 libpcre3-dev libpcre3 libpcre++0 libpcre++-dev libtcmalloc-minimal0 libgoogle-perftools0 libgoogle-perftools-dev
root@zen-lb:~# mkdir hoard
root@zen-lb:~# cd hoard/
root@zen-lb:~/hoard# wget --no-check-certificate https://github.com/emeryberger/Hoard/releases/download/3.10/Hoard-3.10-source.tar.gz
root@zen-lb:~/hoard# gunzip Hoard-3.10-source.tar.gz 
root@zen-lb:~/hoard# tar -xf Hoard-3.10-source.tar 
root@zen-lb:~/hoard# cd Hoard/src
root@zen-lb:~/hoard/Hoard/src# make linux-gcc-x86
root@zen-lb:~/hoard/Hoard/src# cp libhoard.so /usr/lib/.

Add this to our /etc/profile so that the hoard library is loaded:
root@zen-lb:~/hoard/Hoard/src# vi /etc/profile
export LD_PRELOAD=/usr/lib/libhoard.so

Test that it's loaded:
root@zen-lb:~# ldd /bin/ls
        linux-gate.so.1 =>  (0xb7767000)
        /usr/lib/libhoard.so (0xb7725000)
        libselinux.so.1 => /lib/libselinux.so.1 (0xb7704000)
        librt.so.1 => /lib/i686/cmov/librt.so.1 (0xb76fa000)
        libacl.so.1 => /lib/libacl.so.1 (0xb76f3000)
        libc.so.6 => /lib/i686/cmov/libc.so.6 (0xb75ac000)
        libdl.so.2 => /lib/i686/cmov/libdl.so.2 (0xb75a8000)
        libpthread.so.0 => /lib/i686/cmov/libpthread.so.0 (0xb758f000)
        libstdc++.so.6 => /usr/lib/libstdc++.so.6 (0xb7499000)
        libm.so.6 => /lib/i686/cmov/libm.so.6 (0xb7473000)
        libgcc_s.so.1 => /lib/libgcc_s.so.1 (0xb7455000)
        /lib/ld-linux.so.2 (0xb7768000)
        libattr.so.1 => /lib/libattr.so.1 (0xb7450000)


This is our Zen's CPU usage for 10 concurrent connections:


Well, it might just be me but I am seeing a difference, even though small. For the record, apachebench reported around 78 requests per second on average.

This is our Zen's CPU usage for 50 concurrent connections:


Wow, now that's a big difference. The CPU rarely goes above 80%. That's even better than 10 concurrent connections without our optimization libraries installed! For the record, apachebench reported around 72 requests per second on average. It looks like with our new libraries installed, there's a CPU/requests per second tradeoff.

For 100 concurrent connections:

The stress is noticeably less on the CPU. For the record, apachebench reported almost 75 requests per second on average.

Honestly, I wasn't planning on doing any HTTP tests for the stock+libs Zen as the CPU load isn't too large and therefore there isn't much benefit from it, but just out of curiosity, I did a series of tests for 50 concurrent HTTP connections to see difference in requests per second; it was around 76. That's a 14 requests/second drop. Not sure if I can call that a performance enhancement!

Finally, if you're curious, here are a few benchmarks from our Zen LB to the backend:

root@zen-lb:~# ab -c100 -n10000 http://192.168.0.99:80/?bench.cpp
This is ApacheBench, Version 2.3 <$Revision: 655654 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking 192.168.0.99 (be patient)
Completed 1000 requests
Completed 2000 requests
Completed 3000 requests
Completed 4000 requests
Completed 5000 requests
Completed 6000 requests
Completed 7000 requests
Completed 8000 requests
Completed 9000 requests
Completed 10000 requests
Finished 10000 requests


Server Software:        G-WAN
Server Hostname:        192.168.0.99
Server Port:            80

Document Path:          /?bench.cpp
Document Length:        425984 bytes

Concurrency Level:      100
Time taken for tests:   37.843 seconds
Complete requests:      10000
Failed requests:        0
Write errors:           0
Total transferred:      4262600000 bytes
HTML transferred:       4259840000 bytes
Requests per second:    264.25 [#/sec] (mean)
Time per request:       378.429 [ms] (mean)
Time per request:       3.784 [ms] (mean, across all concurrent requests)
Transfer rate:          109999.38 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    7  12.2      6     221
Processing:    37  335 558.7    217    8559
Waiting:        1   71 232.8     17    1659
Total:         37  342 559.0    225    8574

Percentage of the requests served within a certain time (ms)
  50%    225
  66%    250
  75%    293
  80%    328
  90%    413
  95%   1172
  98%   1786
  99%   3626
 100%   8574 (longest request)

root@zen-lb:~# ab -c50 -n10000 http://192.168.0.99:80/?bench.cpp
This is ApacheBench, Version 2.3 <$Revision: 655654 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking 192.168.0.99 (be patient)
Completed 1000 requests
Completed 2000 requests
Completed 3000 requests
Completed 4000 requests
Completed 5000 requests
Completed 6000 requests
Completed 7000 requests
Completed 8000 requests
Completed 9000 requests
Completed 10000 requests
Finished 10000 requests


Server Software:        G-WAN
Server Hostname:        192.168.0.99
Server Port:            80

Document Path:          /?bench.cpp
Document Length:        425984 bytes

Concurrency Level:      50
Time taken for tests:   36.863 seconds
Complete requests:      10000
Failed requests:        0
Write errors:           0
Total transferred:      4262600000 bytes
HTML transferred:       4259840000 bytes
Requests per second:    271.28 [#/sec] (mean)
Time per request:       184.313 [ms] (mean)
Time per request:       3.686 [ms] (mean, across all concurrent requests)
Transfer rate:          112924.62 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    2   4.3      0      83
Processing:    27  169 338.1     84   11517
Waiting:        3   66 227.5     19    3014
Total:         27  171 338.3     85   11517

Percentage of the requests served within a certain time (ms)
  50%     85
  66%     97
  75%    111
  80%    129
  90%    305
  95%    363
  98%   1357
  99%   1545
 100%  11517 (longest request)

root@zen-lb:~# ab -c10 -n10000 http://192.168.0.99:80/?bench.cpp
This is ApacheBench, Version 2.3 <$Revision: 655654 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking 192.168.0.99 (be patient)
Completed 1000 requests
Completed 2000 requests
Completed 3000 requests
Completed 4000 requests
Completed 5000 requests
Completed 6000 requests
Completed 7000 requests
Completed 8000 requests
Completed 9000 requests
Completed 10000 requests
Finished 10000 requests


Server Software:        G-WAN
Server Hostname:        192.168.0.99
Server Port:            80

Document Path:          /?bench.cpp
Document Length:        425984 bytes

Concurrency Level:      10
Time taken for tests:   38.634 seconds
Complete requests:      10000
Failed requests:        0
Write errors:           0
Total transferred:      4262600000 bytes
HTML transferred:       4259840000 bytes
Requests per second:    258.84 [#/sec] (mean)
Time per request:       38.634 [ms] (mean)
Time per request:       3.863 [ms] (mean, across all concurrent requests)
Transfer rate:          107746.80 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.1      0       2
Processing:    23   38   5.7     38      77
Waiting:        4    9   2.6      9      38
Total:         23   39   5.7     38      78

Percentage of the requests served within a certain time (ms)
  50%     38
  66%     40
  75%     42
  80%     43
  90%     45
  95%     48
  98%     53
  99%     57
 100%     78 (longest request)


Tuesday, September 9, 2014

Zen Load Balancer 3.0.3 Perfomance and Security Customization Part 5

Now, let's move on to NIC bonding. This is useful if one of our NICs goes dead; we obviously want to make sure if that happens, we have another standing by that will take over.

Many admins have a dedicated VLAN for cluster synchronization purposes. Some others just connect two nodes using a crossover cable. That means that if one NIC goes down, all hell breaks loose; if it is the cluster synchronization NIC, then both nodes think that the other node has gone down and they both try to become masters causing havoc to the network; in any other case your frontends and backends seem to be down due to your NIC being dead.

So in that case, we employ NIC bonding. There are actually a few types of network bonding (from here): 
  • balance-rr or 0: Round-robin policy: Transmit packets in sequential order from the first available slave through the last.  This mode provides load balancing and fault tolerance.
  • active-backup or 1: Active-backup policy: Only one slave in the bond is active.  A different slave becomes active if, and only if, the active slave fails. The bond's MAC address is externally visible on only one port (network adapter) to avoid confusing the switch.
    In bonding version 2.6.2 or later, when a failover occurs in active-backup mode, bonding will issue one or more gratuitous ARPs on the newly active slave. One gratutious ARP is issued for the bonding master interface and each VLAN interfaces configured above it, provided that the interface has at least one IP address configured.  Gratuitous ARPs issued for VLAN interfaces are tagged with the appropriate VLAN id. This mode provides fault tolerance.
  • balance-xor or 2: XOR policy: Transmit based on the selected transmit hash policy.  The default policy is a simple [(source MAC address XOR'd with destination MAC address) modulo slave count].  Alternate transmit policies may be selected via the xmit_hash_policy option. This mode provides load balancing and fault tolerance.
  • broadcast or 3: Broadcast policy: transmits everything on all slave interfaces.  This mode provides fault tolerance.
  • 802.3ad or 4: IEEE 802.3ad Dynamic link aggregation.  Creates aggregation groups that share the same speed and duplex settings.  Utilizes all slaves in the active aggregator according to the 802.3ad specification. Slave selection for outgoing traffic is done according to the transmit hash policy, which may be changed from the default simple XOR policy via the xmit_hash_policy option. Note that not all transmit policies may be 802.3ad compliant, particularly in regards to the packet mis-ordering requirements of section 43.2.4 of the 802.3ad standard.  Differing peer implementations will have varying tolerances for noncompliance. Prerequisites:
    1. Ethtool support in the base drivers for retrieving the speed and duplex of each slave.
    2. A switch that supports IEEE 802.3ad Dynamic link aggregation.
    3. Most switches will require some type of configuration to enable 802.3ad mode.
  • balance-tlb or 5: Adaptive transmit load balancing: channel bonding that does not require any special switch support.  The outgoing traffic is distributed according to the current load (computed relative to the speed) on each slave.  Incoming traffic is received by the current slave.  If the receiving slave fails, another slave takes over the MAC address of the failed receiving slave. Prerequisites:
    1. Ethtool support in the base drivers for retrieving the speed and duplex of each slave.
  • balance-alb or 6: Adaptive load balancing: includes balance-tlb plus receive load balancing (rlb) for IPV4 traffic, and does not require any special switch support.  The receive load balancing is achieved by ARP negotiation. The bonding driver intercepts the ARP Replies sent by the local system on their way out and overwrites the source hardware address with the unique hardware address of one of the slaves in the bond such that different peers use different hardware addresses for the server. Receive traffic from connections created by the server is also balanced. When the local system sends an ARP Request the bonding driver copies and saves the peer's IP information from the ARP packet.  When the ARP Reply arrives from the peer, its hardware address is retrieved and the bonding driver initiates an ARP reply to this peer assigning it to one of the slaves in the bond. A problematic outcome of using ARP negotiation for balancing is that each time that an ARP request is broadcast it uses the hardware address of the bond.  Hence, peers learn the hardware address of the bond and the balancing of receive traffic collapses to the current slave.  This is handled by sending updates (ARP Replies) to all the peers with their individually assigned hardware address such that the traffic is redistributed.  Receive traffic is also redistributed when a new slave is added to the bond and when an inactive slave is re-activated.  The receive load is distributed sequentially (round robin) among the group of highest speed slaves in the bond. When a link is reconnected or a new slave joins the bond the receive traffic is redistributed among all active slaves in the bond by initiating ARP Replies with the selected mac address to each of the clients. The updelay parameter (detailed below) must be set to a value equal or greater than the switch's forwarding delay so that the ARP Replies sent to the peers will not be blocked by the switch. Prerequisites:
    1. Ethtool support in the base drivers for retrieving the speed and duplex of each slave.
    2. Base driver support for setting the hardware address of a device while it is open. This is required so that there will always be one slave in the team using the bond hardware address (the curr_active_slave) while having a unique hardware address for each slave in the bond. If the curr_active_slave fails its hardware address is swapped with the new curr_active_slave that was chosen.
In this example we will employ the active-backup method. This is the safest method to use. Most googlers like link aggregation, since an aggregation group will increase the overall bandwidth of the resulting interface.

Let's suppose we want to bond eth8 and eth3 to an interface with the IP 172.16.0.8/22, eth9 and eth4 to an interface with the IP 172.16.4.8/22, and eth0 and eth9 to an interface with the IP 172.16.8.8/23:

root@zen-lb:~# apt-get install ifenslave-2.6
root@zen-lb:~# vi /etc/network/interfaces
auto lo
iface lo inet loopback
auto bond0
iface bond0 inet static
    address 172.16.0.8
    netmask 255.255.252.0
    network 172.16.0.0
    gateway 172.16.0.1
    slaves eth8 eth3
    bond-mode active-backup
    bond-miimon 100
    bond-primary eth8
auto bond1
iface bond1 inet static
    address 172.16.4.8
    netmask 255.255.252.0
    network 172.16.4.0
    slaves eth9 eth4
    bond-mode active-backup
    bond-miimon 100
    bond-primary eth9
auto bond2
iface bond2 inet static
    address 172.16.8.8
    netmask 255.255.254.0
    network 172.16.8.0
    slaves eth0 eth5
    bond-mode active-backup
    bond-miimon 100
    bond-primary eth0

bond-primary is the NIC that will be our primary device.
bond-miimon is how often the link state will be polled.
So, in our case, every 100ms eth8 and eth3 will be polled; if eth8 is up, then this will serve our incoming and outgoing requests, otherwise eth3 will take charge.

root@zen-lb:~# rm /usr/local/zenloadbalancer/config/if_eth*
root@zen-lb:~# vi /usr/local/zenloadbalancer/config/if_bond0_conf
bond0::172.16.0.8:255.255.252.0:up::
root@zen-lb:~# vi /usr/local/zenloadbalancer/config/if_bond1_conf
bond1::172.16.4.8:255.255.252.0:up::
root@zen-lb:~# vi /usr/local/zenloadbalancer/config/if_bond2_conf
bond2::172.16.8.8:255.255.254.0:up::
root@zen-lb:~# vi /usr/local/zenloadbalancer/config/global.conf_conf
.....
#System Default Gateway
$defaultgw="172.16.0.1";
#Interface Default Gateway
$defaultgwif="bond0";
.....
#Also change the ntp server
.....
$ntp="0.europe.pool.ntp.org";
.....

You might also want to change these particular ports on your switch to portfast. That way, you won't have to wait for the forward delay (and as far as these particular ports go, forward delay is useless any way) and the transition will be seemless.

All right, let's see if it all works:

root@zen-lb:~# cat /proc/net/bonding/bond0
Ethernet Channel Bonding Driver: v3.5.0 (November 4, 2008)

Bonding Mode: fault-tolerance (active-backup)
Primary Slave: None
Currently Active Slave: eth8
MII Status: up
MII Polling Interval (ms): 100
Up Delay (ms): 0
Down Delay (ms): 0

Slave Interface: eth8
MII Status: up
Link Failure Count: 0
Permanent HW addr: 00:19:b9:e4:12:a3

Slave Interface: eth3
MII Status: up
Link Failure Count: 0
Permanent HW addr: 00:1f:29:57:cf:fe

root@zen-lb:~# cat /proc/net/bonding/bond1
Ethernet Channel Bonding Driver: v3.5.0 (November 4, 2008)

Bonding Mode: fault-tolerance (active-backup)
Primary Slave: None
Currently Active Slave: eth9
MII Status: up
MII Polling Interval (ms): 100
Up Delay (ms): 0
Down Delay (ms): 0

Slave Interface: eth9
MII Status: up
Link Failure Count: 0
Permanent HW addr: 00:19:b9:e4:12:a5

Slave Interface: eth4
MII Status: up
Link Failure Count: 0
Permanent HW addr: 00:1f:29:0d:69:81

root@zen-lb:~# cat /proc/net/bonding/bond2
Ethernet Channel Bonding Driver: v3.5.0 (November 4, 2008)

Bonding Mode: fault-tolerance (active-backup)
Primary Slave: None
Currently Active Slave: eth0
MII Status: up
MII Polling Interval (ms): 100
Up Delay (ms): 0
Down Delay (ms): 0

Slave Interface: eth0
MII Status: up
Link Failure Count: 0
Permanent HW addr: 00:1f:29:57:cf:fd

Slave Interface: eth5
MII Status: up
Link Failure Count: 0
Permanent HW addr: 00:1f:29:0d:69:80

And if you try to disconnect, or otherwise bring down any of the primary slave interfaces you'll see that the active backup will come up almost instantly (provided you set those ports to portfast on your switch).

Thursday, September 4, 2014

Zen Load Balancer 3.0.3 Perfomance and Security Customization Part 4

Time to fine-tune our IP stack:

root@zen-lb:~# vi /etc/sysctl.conf
# Performance:
# Turn down swappiness, 0 means no swap on modern kernels, number is percent of free system memory swap will kick in.
vm.swappiness = 2
# Contains, as a percentage of total system memory, the number of pages at which a process which is generating disk writes will start writing out dirty data.
# Defaults to 10 (percent of RAM). Consensus is that 10% of RAM when RAM is say half a GB (so 10% is ~50 MB) is a sane value on spinning disks, but it can be MUCH worse when RAM is larger, say 16 GB (10% is ~1.6 GB), as that's several seconds of writeback on spinning disks. A more sane value in this case is 3 (16*0.03 ~ 491 MB).
vm.dirty_ratio = 3
# Contains, as a percentage of total system memory, the number of pages at which the background kernel flusher threads will start writing out dirty data.
# Defaults to 5 (percent of RAM). It may be just fine for small memory values, but again, consider and adjust accordingly for the amount of RAM on a particular system.
vm.dirty_background_ratio = 2
# Will not change the number of System V IPC message queue resources allowed
# Will not change Kernel semaphores (kernel.sem [semmsl semmns semopm semmni], kernel.shmmax and kernel.shmmin)

# Network Performance:
# Turn off TCP prequeue processing
net.ipv4.tcp_low_latency = 1
# Reuse time-wait sockets, better than recycling
net.ipv4.tcp_tw_reuse = 1
# Fast recycling TIME-WAIT sockets using recycling rather than reusing. Default value is 0. It should not be changed without advice/request of technical experts.
net.ipv4.tcp_tw_recycle = 0
# Maximum time-to-live of entries. Unused entries will expire after this period of time if there is no memory pressure on the pool.
net.ipv4.inet_peer_maxttl = 5
# How often to send out keepalive messages when keepalive is enabled. Default is 7200 seconds.
net.ipv4.tcp_keepalive_time = 512
# How frequent probes are retransmitted, when a probe isn't acknowledged. Default is 75 seconds
net.ipv4.tcp_keepalive_intvl = 15
# Number of keepalive probes to send until the server decides that the connection is broken.
net.ipv4.tcp_keepalive_probes = 5
# Number of outstanding syn requests allowed. This setting tells the system when to start using syncookies. When you have more TCP connection requests in your queue than this number, the system will start using syncookies. Note that syncookies can have an impact on performance.
net.ipv4.tcp_max_syn_backlog = 36000
# Size of the listen queue
net.core.somaxconn = 36000
# Maximum number of timewait sockets held by the system simultaneously.
net.ipv4.tcp_max_tw_buckets = 100000
# Increase TCP default and max receive/send buffer size
net.core.rmem_default = 16777216
net.core.rmem_max = 16777216
net.core.wmem_default = 16777216
net.core.wmem_max = 16777216
# Same for UDP
net.ipv4.udp_rmem_min = 8192
net.ipv4.udp_wmem_min = 8192
# Increase the maximum amount of option memory buffers
net.core.optmem_max= 20480
# Increase Linux autotuning TCP receive/send buffer limit
net.ipv4.tcp_rmem = 4096 87380 16777216
net.ipv4.tcp_wmem = 4096 65536 16777216
# Increase the length of the packets queue waiting on an interface until the kernel is ready to process them
# The backlog of pending connections allows the server to hold connections it’s not ready to accept, and this allows it to withstand a larger slow HTTP attack, as well as gives legitimate users a chance to be served under high load. However, a large backlog also prolongs the attack, since it backlogs all connection requests regardless of whether they’re legitimate. If the server supports a backlog, I recommend making it reasonably large to so your HTTP server can handle a small attack.
net.core.netdev_max_backlog = 30000
# This setting determines the time that must elapse before TCP/IP can release a closed connection and reuse its resources.
net.ipv4.tcp_fin_timeout = 30
# Turn connection accounting on
net.netfilter.nf_conntrack_acct = 1
# Maximum number of tracked connections, the toll is 300-350 bytes of unswapped RAM per connection. Hash table should accordingly be hashsize = conntrack_max / 8, that why the options ip_conntrack hashsize=25000 and options nf_conntrack hashsize=25000 in modprobe.conf
net.ipv4.netfilter.ip_conntrack_max = 200000
# Dynamically-assigned ports range; bear in mind that in theory IANA has officially designated the range 49152 - 65535 for dynamic port assignment. The default linux range for modern kernels is 32768 - 61000.
net.ipv4.ip_local_port_range = 10000 65535
# Now that we've increased the ports, we need to increase the number of file handlers as well. This parameter should be at least as twice big as the number of network connections you expect to support. We should also change the number of max number of open files a user can have in /etc/security/limits.conf.
fs.file-max = 1048576
# Increase the number of allowed mmapped files
vm.max_map_count = 1048576
# This setting determines the number of SYN+ACK packets sent in part 2 of a 3-way-handshake before the kernel gives up on the connection. Default is 5.
net.ipv4.tcp_synack_retries = 3
# Number of times initial SYNs for a TCP connection attempt will be retransmitted. This is only the timeout for outgoing connections. Default is 5.
net.ipv4.tcp_syn_retries = 3
# This defines how often an answer to a TCP connection request is retransmitted before it gives up. This is only the timeout for incoming connections. Default is 3.
net.ipv4.tcp_retries1 = 3
# Determines how the TCP stack should behave for memory usage; each count is in memory pages (typically 4KB).
net.ipv4.tcp_mem = 50576 64768 98152
#net.ipv4.tcp_mem = 128000 200000 262144 # Use this for 1Gb+ connections
# The TCP window scale option is an option to increase the receive window size allowed in TCP above its former maximum value of 65535 bytes. See IETF RFC 1323.
# Linux kernels from 2.6.8 have enabled TCP Window Scaling by default
net.ipv4.tcp_window_scaling = 1
# How may times to retry before killing TCP connection, closed by our side. Default 0.
net.ipv4.tcp_orphan_retries = 0
# Security:
# Debian does not have kernel.exec-shield, check that you have NX (Execute Disable) protection: active with dmesg | grep protection. To have NX protection, your BIOS, your CPU, your OS must support it and you must have a 32-bit PAE or 64 bit kernel (NX bit works on the 63rd bit of the address)
#kernel.exec-shield = 1
# Turn on protection and randomize stack, vdso page and mmap + randomize brk base address.
kernel.randomize_va_space = 2
# tcp_syncookies with appropriate tcp_synack_retries and tcp_max_syn_backlog can mitigate SYN flood attacks. Note That without SYN cookies, a much larger value for tcp_max_syn_backlog is required. Default is 1.
net.ipv4.tcp_syncookies = 1
# Protect against tcp time-wait assassination hazards
net.ipv4.tcp_rfc1337 = 1
# Timestamps can provide security by protecting against wrapping sequence numbers (at gigabit speeds) but they also allow uptime detection. Definitely enable for Gb+ speeds, up to the admin to decide what to do for slower speeds.
# 1 is the default value but it has some overhead, use 0 for slightly better performance.
net.ipv4.tcp_timestamps = 0
#net.ipv4.tcp_timestamps = 1 # Use this for 1Gb+ connections
# Source address verification, helps protect against spoofing attacks.
net.ipv4.conf.all.rp_filter = 1
net.ipv4.conf.default.rp_filter = 1    
# Usually, we'd want to disable IP forwarding but our LB is also a router so no choice here:
net.ipv4.ip_forward = 1
# Log martian packets
# This is a router, it will receive martians all the time, better turn this off. Otherwise, we'd want to turn this on.
net.ipv4.conf.all.log_martians = 0
net.ipv4.conf.default.log_martians = 0   
# Ignore echo broadcast requests to prevent being part of smurf attacks (default)
net.ipv4.icmp_echo_ignore_broadcasts = 1
# Ignore *all* echo requests, including on localhost (default 0). Enabling it is paranoid really.
net.ipv4.icmp_echo_ignore_all = 0
# Ignore bogus icmp errors (default)
net.ipv4.icmp_ignore_bogus_error_responses = 1
# IP source routing (insecure, disable it) (default)
net.ipv4.conf.all.accept_source_route = 0
net.ipv4.conf.default.accept_source_route = 0 
net.ipv6.conf.all.accept_source_route = 0
net.ipv6.conf.default.accept_source_route = 0
# Send redirects: Usually, we'd want to disable it but we're a LB aka router:
net.ipv4.conf.all.send_redirects = 1
net.ipv4.conf.default.send_redirects = 1
# ICMP only accept secure routing redirects (we could deny redirects altogether actually).
net.ipv4.conf.all.accept_redirects = 0
net.ipv4.conf.default.accept_redirects = 0 
net.ipv6.conf.all.accept_redirects = 0
net.ipv6.conf.default.accept_redirects = 0 
net.ipv4.conf.all.secure_redirects = 1
net.ipv4.conf.default.secure_redirects = 1
# Disable IPv6 router solicitations:
net.ipv6.conf.default.router_solicitations = 0
# Do not accept Router Preference in RA
net.ipv6.conf.default.accept_ra_rtr_pref = 0
# Do not learn Prefix Information in Router Advertisement
net.ipv6.conf.default.accept_ra_pinfo = 0
# Will not accept Hop Limit settings from a router advertisement
net.ipv6.conf.default.accept_ra_defrtr = 0
# Do not assign a global unicast address to an interface according to router advertisements
net.ipv6.conf.default.autoconf = 0
# Do not send neighbor solicitations
net.ipv6.conf.default.dad_transmits = 0
# Only one global unicast IPv6 address per interface
net.ipv6.conf.default.max_addresses = 1
# And after all this, we disable IPv6 awwww :(                  
net.ipv6.conf.all.disable_ipv6 = 1
net.ipv6.conf.default.disable_ipv6 = 1


If you have memore to spare, you can use replace these corresponding settings with:

.... 
net.core.rmem_max=1677721600
net.core.rmem_default=167772160
net.core.wmem_max=1677721600
net.core.wmem_default=167772160
net.core.optmem_max= 2048000
....
net.ipv4.tcp_rmem= 1024000 8738000 1677721600
net.ipv4.tcp_wmem= 1024000 8738000 1677721600
net.ipv4.tcp_mem= 1024000 8738000 1677721600
net.ipv4.udp_mem= 1024000 8738000 1677721600
....


Monday, August 18, 2014

Zen Load Balancer 3.0.3 Perfomance and Security Customization Part 3

Time to do a little network tweaking.

Increase our iptables connection tracking numbers:

ipt_recent parameters:

Note that by default these values are used by ipt_recent module:
ip_list_tot=100     Number of addresses remembered per table
ip_pkt_list_tot=20     Number of packets per address remembered
ip_list_hash_size=0     Hash table size. 0 means to calculate it based on ip_list_tot, default: 512
ip_list_perms=0644     Permissions for /proc/net/ipt_recent/* files
 
root@zen-lb:~# vi /etc/modprobe.d/ipt_recent.conf
options ipt_recent ip_list_tot=3000 ip_pkt_list_tot=100
options xt_recent ip_list_tot=3000 ip_pkt_list_tot=100
options ip_conntrack hashsize=25000
options nf_conntrack hashsize=25000

In some kernels, ipt_recent is xt_recent and ip_conntrack is nf_conntrack, so I've included them all. It won't hurt, but you may get a warning when your system is running about it.

root@zen-lb:~# iptables -F
root@zen-lb:~# modprobe -r ipt_recent
root@zen-lb:~# modprobe ipt_recent  
root@zen-lb:~# modprobe -r xt_recent
root@zen-lb:~# modprobe xt_recent
root@zen-lb:~# modprobe xt_recent
root@zen-lb:~# cat /sys/module/xt_recent/parameters/ip_list_tot
3000
root@zen-lb:~# cat /sys/module/xt_recent/parameters/ip_pkt_list_tot
100

Cool. It worked. Let's get to our iptables rules. Don't forget that your load balancer is a router, so we're going to start off with NAT. We assume that eth0 is our external interface.

root@zen-lb:~# iptables -t nat -A POSTROUTING -o eth0 -j MASQUERADE

Now, we create two new iptables tables. One to log and drop illegal packets and another for rate limiting purposes. The latter is especially useful if we have a webserver running.

root@zen-lb:~# iptables -N LOGDROP
root@zen-lb:~# iptables -N RATELIMIT

Our "always accept from the loopback interface" and "always accept related and established connections" rules follow:

root@zen-lb:~# iptables -A INPUT -i lo -j ACCEPT
root@zen-lb:~# iptables -A INPUT -m state --state ESTABLISHED,RELATED -j ACCEPT

OK, time to separate the wheat from the chaff. Disallow any type of invalid packets by sending them to our new table, LOGDROP:

root@zen-lb:~# iptables -A INPUT -m state --state INVALID -j LOGDROP
root@zen-lb:~# iptables -A INPUT -p tcp ! --syn -m state --state NEW -j LOGDROP
root@zen-lb:~# iptables -A INPUT -p tcp --tcp-flags ALL ALL -j LOGDROP
root@zen-lb:~# iptables -A INPUT -p tcp --tcp-flags ALL NONE -j LOGDROP
root@zen-lb:~# iptables -A INPUT -p tcp --tcp-flags SYN,FIN SYN,FIN -j LOGDROP
root@zen-lb:~# iptables -A INPUT -p tcp --tcp-flags SYN,RST SYN,RST -j LOGDROP
root@zen-lb:~# iptables -A INPUT -p tcp --tcp-flags FIN,RST FIN,RST -j LOGDROP
root@zen-lb:~# iptables -A INPUT -p tcp --tcp-flags ACK,FIN FIN -j LOGDROP
root@zen-lb:~# iptables -A INPUT -p tcp --tcp-flags ACK,PSH PSH -j LOGDROP
root@zen-lb:~# iptables -A INPUT -p tcp --tcp-flags ACK,URG URG -j LOGDROP
root@zen-lb:~# iptables -A INPUT -p tcp --tcp-flags SYN,ACK SYN,ACK -m state --state NEW -j LOGDROP
root@zen-lb:~# iptables -A INPUT -p tcp --tcp-flags ALL SYN,RST,ACK,FIN,URG -j LOGDROP

Following this, we should add any hosts we trust, such as incoming connections from your VPN and generally any incoming connections that you don't want to add to your rate limit:

root@zen-lb:~# iptables -A INPUT -s 172.16.200.0/24 -j ACCEPT

Now, assuming that we have a webserver running, I am going to allow access to ports 80 (http) and 443 (https). This is not complete access as you can see, as any new connection goes to my RATELIMIT.

root@zen-lb:~# iptables -A INPUT -p tcp -m multiport --dports 80,443 -m state --state NEW -j RATELIMIT

Next, we can add some new trusted connections but this time, not completely trusted. Say some partners. We want them to be actually rate limited, just in case.

root@zen-lb:~# iptables -A INPUT -s 192.168.200.0/24 -j ACCEPT

Now, we'll add ports 80 and 443 once again. Why? Well, if our client hasn't hit our rate limit, they're going to return from our RATELIMIT chain, so we want to accept that.

root@zen:~# iptables -A INPUT -m tcp -p tcp --dport 80 -j ACCEPT
root@zen-lb:~# iptables -A INPUT -m tcp -p tcp --dport 443 -j ACCEPT

And the obligatory "if it doesn't fit any of our aforementioned rules, kill it with fire":

root@zen-lb:~# iptables -A INPUT -j REJECT --reject-with icmp-host-prohibited

Now, once again, our load balancer is a router so we need to enable IP forwarding on it. BUT, for security purposes what we'll do is that we'll only allow it to forward packets from our network to specific hosts, such as NTP, DNS, apt-get and yum update servers. In this example, we assume that the subnets behind our load balancer are the 172.16.104.0/22 and the 172.16.108/24 ones.

root@zen-lb:~# iptables -A FORWARD -s 172.16.104.0/22,172.16.108.0/24 -d 8.8.8.8,8.8.4.4 -j ACCEPT #DNS
root@zen-lb:~# iptables -A FORWARD -s 172.16.104.0/22,172.16.108.0/24 -d 62.1.38.19,62.1.38.25,140.211.169.197,152.19.134.146,66.35.62.166,66.135.62.201,209.132.181.16,67.203.2.67,85.236.55.6,213.175.193.206,195.154.241.117,74.121.199.234 -j ACCEPT #Oracle Linux, EPEL, Remi and Percona Update Servers
root@zen-lb:~# iptables -A FORWARD -s 172.16.104.0/22,172.16.108.0/24 -d 193.93.167.241,79.107.99.220,83.212.114.205,193.239.214.226,83.212.118.71,193.164.227.145,194.177.210.54,83.212.96.50 -j ACCEPT #Red Hat NTP Servers
root@zen-lb:~# iptables -A FORWARD -m state --state ESTABLISHED,RELATED -j ACCEPT
root@zen-lb:~# iptables -A FORWARD -j REJECT --reject-with icmp-host-prohibited

Anything that originates from our network should be allowed:

root@zen-lb:~# iptables -A OUTPUT -j ACCEPT

Our LOGDROP chain:

root@zen-lb:~# iptables -A LOGDROP -j LOG --log-prefix "INVALID PACKET:"
root@zen-lb:~# iptables -A LOGDROP -j DROP

And finally our RATELIMIT chain:

root@zen-lb:~# iptables -A RATELIMIT -m recent --set --name RATELIMIT --rsource
root@zen-lb:~# iptables -A RATELIMIT -m recent --rcheck --seconds 5 --hitcount 80 --name RATELIMIT --rsource -j LOG --log-prefix "EXCEEDED RATE:"
root@zen-lb:~# iptables -A RATELIMIT -m recent --rcheck --seconds 5 --hitcount 80 --name RATELIMIT --rsource -j DROP

As you can see, this is extremely liberal. It needs 80 new connections attempts every 5 seconds from the same IP address to start dropping packets. You need to tweak the values according to the needs of your webserver.

Let's make the changes permanent:

root@zen-lb:~# iptables-save > /etc/iptables.up.rules
root@zen-lb:~# vi /etc/init.d/iptables_fw
#!/bin/sh
### BEGIN INIT INFO
# Provides:          iptables_init
# Required-Start:    $local_fs $network
# Required-Stop:
# Default-Start:     2 3 4 5 
# Default-Stop:      0 1 6
# Short-Description: Firewall script
# Description:       Start iptables-based firewall
### END INIT INFO
#
iptables-restore < /etc/iptables.up.rules
 
root@zen-lb:~# iptables-restore < /etc/iptables.up.rules
root@zen-lb:~# chmod 755 /etc/init.d/iptables_fw
root@zen-lb:~# iptables -F
root@zen-lb:~# service iptables_fw start
root@zen-lb:~# update-rc.d iptables_fw defaults

And finally, tell to the kernel that it should allow forwarding as well:

root@zen-lb:~# echo "1" > /proc/sys/net/ipv4/ip_forward
root@zen-lb:~# vi /etc/sysctl.conf
# Uncomment the next line to enable packet forwarding for IPv4
net.ipv4.ip_forward=1