Wednesday, October 8, 2014

Zen Load Balancer Performance Tests Part 1

This is my Zen Load Balancer's stats:

Memory: 2048MB
CPU #1: 3.0GHz

I decided to use G-WAN as the backend, mainly because:

a) I hadn't used it before and it was a good opportunity to do so;
b) They brag a ridiculously high performance throughput -you know you have to at least check if it's partly true, and if it is, all the better as I'll stress Zen even more;
c) They have an enormous amount of haters; from other webserver fanbois who refuse to accept that a different webserver than the one they use might be better, to company plants that ticket non-existant vulnerabilities to try to prove that G-WAN isn't as good as its publishers claim. This alone can make me install software just out of spite.

So G-WAN it is, then!

The webpage that I will serve will be generated by this little C++ servlet here:
root@gwan:/opt/gwan_linux64-bit/0.0.0.0_8080/#0.0.0.0/csp# cat bench.cpp
// ============================================================================
// C++ servlet example to benchmark Zen and G-WAN Web Application Server   (http://gwan.ch/)
// ----------------------------------------------------------------------------
// bench.cpp: Concatenate a bunch of "Hello world!" until it gets to 400KB (encodings included).
//
// This code is based on the hello.cpp example by Thomas Meitz (gwan at jproxx dot com)
// ============================================================================
// imported functions:
//   get_reply(): get a pointer on the 'reply' dynamic buffer from the server
//   xbuf_cat(): like strcat(), but it works in the specified dynamic buffer
// ----------------------------------------------------------------------------
#include "gwan.h" // G-WAN definitions
#include <string>

using namespace std;

class Hello
{
public:
   void whatsup(xbuf_t* reply, string& str)
   {
      xbuf_cat(reply, (char*)str.c_str());
   }
};
// ----------------------------------------------------------------------------
int main(int argc, char *argv[])
{
   string h("Hello World! ");
   do {
   h+=h;
   } while (h.length() < 240000);
   Hello hello;
   hello.whatsup(get_reply(argv), h);

   return 200; // return an HTTP code (200:'OK')
}
// ============================================================================
// End of Source Code
// ============================================================================


These were the farm settings:



Tests with stock Zen Load Balancer 3.0.5 community edition

All right, let's begin with stock Zen. No updates and no extra software, except what we'll be using to collect our statistical data (nmon) and any packages needed to install vmware tools.
Also, ipv4 forwarding was set to 1 and I set up NAT to serve my backend, which was in a different network: iptables -t nat -A POSTROUTING -o eth0 -j MASQUERADE.

The test will be a little insane, we'll send 10, 50 and 100 requests concurrently and 10000 in total, all trying to access the output of our script.

The tests will be run 5 times each. After that, we'll find the median value according to that point in time so as to eliminate any statistical aberrations.

But first, let's do a few simple HTTP tests to see the difference in performance and resource usage:
D:\apache\Apache24\bin>ab -c10 -n10000 http://192.168.0.99:80/?bench.cpp
This is ApacheBench, Version 2.3 <$Revision: 1554214 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking 192.168.0.99 (be patient)
Completed 1000 requests
Completed 2000 requests
Completed 3000 requests
Completed 4000 requests
Completed 5000 requests
Completed 6000 requests
Completed 7000 requests
Completed 8000 requests
Completed 9000 requests
Completed 10000 requests
Finished 10000 requests


Server Software:        G-WAN
Server Hostname:        192.168.0.99
Server Port:            80

Document Path:          /?bench.cpp
Document Length:        425984 bytes

Concurrency Level:      10
Time taken for tests:   112.649 seconds
Complete requests:      10000
Failed requests:        0
Total transferred:      4262600000 bytes
HTML transferred:       4259840000 bytes
Requests per second:    88.77 [#/sec] (mean)
Time per request:       112.649 [ms] (mean)
Time per request:       11.265 [ms] (mean, across all concurrent requests)
Transfer rate:          36952.96 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    4  60.4      0    3016
Processing:    16  108 184.0     94    3126
Waiting:        0   23  78.1     16    3047
Total:         16  112 193.4     94    3126

Percentage of the requests served within a certain time (ms)
  50%     94
  66%    109
  75%    125
  80%    125
  90%    156
  95%    172
  98%    203
  99%    266
 100%   3126 (longest request)

That's pretty sweet performance right there! Our webserver offers dynamic content of a 420KB page at almost 89 requests per second, without any caching mechanism to aid. Impressive. I'd go as far as to say that Zen is a bottleneck here.

So let's rinse and repeat 4 more times, calculate the median and create a graph.

This is our Zen's CPU usage for 10 concurrent connections:


For the record, apachebench reported around 84 requests per second on average.

For 50 concurrent connections:



For the record, apachebench reported around 78 requests per second on average.

Let's finish this with 100 concurrent connections:


For the record, apachebench reported around 75 requests per second on average.

As you can see, nothing dramatic, and CPU usage rarely exceeds 45%. Time to see what happens if we use HTTPS instead.

D:\apache\Apache24\bin>abs -c10 -n10000 https://192.168.0.99:443/?bench.cpp
This is ApacheBench, Version 2.3 <$Revision: 1554214 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking 192.168.0.99 (be patient)
Completed 1000 requests
Completed 2000 requests
Completed 3000 requests
Completed 4000 requests
Completed 5000 requests
Completed 6000 requests
Completed 7000 requests
Completed 8000 requests
Completed 9000 requests
Completed 10000 requests
Finished 10000 requests


Server Software:        G-WAN
Server Hostname:        192.168.0.99
Server Port:            443
SSL/TLS Protocol:       TLSv1,RC4-SHA,1024,128

Document Path:          /?bench.cpp
Document Length:        425984 bytes

Concurrency Level:      10
Time taken for tests:   129.153 seconds
Complete requests:      10000
Failed requests:        0
Total transferred:      4262600018 bytes
HTML transferred:       4259840000 bytes
Requests per second:    77.43 [#/sec] (mean)
Time per request:       129.153 [ms] (mean)
Time per request:       12.915 [ms] (mean, across all concurrent requests)
Transfer rate:          32230.65 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0   37 165.0     31    3079
Processing:    16   92 212.3     78    3157
Waiting:        0   17  86.0     16    3049
Total:         16  129 268.0    109    3172

Percentage of the requests served within a certain time (ms)
  50%    109
  66%    109
  75%    109
  80%    125
  90%    125
  95%    141
  98%    141
  99%    375
 100%   3172 (longest request)

Wow, our performance is close to what apachebench reported for 50 concurrent connections on plain HTTP requests!

So let's rinse and repeat 4 more times, calculate the median and create a graph.

This is our Zen's CPU usage for 10 concurrent connections:



The CPU usage cost is staggering; we rarely see it drop below 80%. For the record, apachebench reported almost 85 requests per second on average.

For 50 concurrent connections:


Look at that CPU usage! Most of the time is at 100%! For the record, apachebench reported almost 90 requests per second on average.

For 100 concurrent connections:


Considering the previous results, honestly I expected even worse! For the record, apachebench reported around 74 requests per second on average.

Tests with stock Zen Load Balancer 3.0.5 community edition + PCRE + TCMalloc + HOARD

If PCRE, tcmalloc (from the Google perftools package) and/or Hoard are available Pound will link against them. This will provide a significant performance boost and is highly recommended.

Let's see the perfomance boost for ourselves then.

root@zen-lb:~# apt-get install make cmake g++ libpcrecpp0 libpcre3-dev libpcre3 libpcre++0 libpcre++-dev libtcmalloc-minimal0 libgoogle-perftools0 libgoogle-perftools-dev
root@zen-lb:~# mkdir hoard
root@zen-lb:~# cd hoard/
root@zen-lb:~/hoard# wget --no-check-certificate https://github.com/emeryberger/Hoard/releases/download/3.10/Hoard-3.10-source.tar.gz
root@zen-lb:~/hoard# gunzip Hoard-3.10-source.tar.gz 
root@zen-lb:~/hoard# tar -xf Hoard-3.10-source.tar 
root@zen-lb:~/hoard# cd Hoard/src
root@zen-lb:~/hoard/Hoard/src# make linux-gcc-x86
root@zen-lb:~/hoard/Hoard/src# cp libhoard.so /usr/lib/.

Add this to our /etc/profile so that the hoard library is loaded:
root@zen-lb:~/hoard/Hoard/src# vi /etc/profile
export LD_PRELOAD=/usr/lib/libhoard.so

Test that it's loaded:
root@zen-lb:~# ldd /bin/ls
        linux-gate.so.1 =>  (0xb7767000)
        /usr/lib/libhoard.so (0xb7725000)
        libselinux.so.1 => /lib/libselinux.so.1 (0xb7704000)
        librt.so.1 => /lib/i686/cmov/librt.so.1 (0xb76fa000)
        libacl.so.1 => /lib/libacl.so.1 (0xb76f3000)
        libc.so.6 => /lib/i686/cmov/libc.so.6 (0xb75ac000)
        libdl.so.2 => /lib/i686/cmov/libdl.so.2 (0xb75a8000)
        libpthread.so.0 => /lib/i686/cmov/libpthread.so.0 (0xb758f000)
        libstdc++.so.6 => /usr/lib/libstdc++.so.6 (0xb7499000)
        libm.so.6 => /lib/i686/cmov/libm.so.6 (0xb7473000)
        libgcc_s.so.1 => /lib/libgcc_s.so.1 (0xb7455000)
        /lib/ld-linux.so.2 (0xb7768000)
        libattr.so.1 => /lib/libattr.so.1 (0xb7450000)


This is our Zen's CPU usage for 10 concurrent connections:


Well, it might just be me but I am seeing a difference, even though small. For the record, apachebench reported around 78 requests per second on average.

This is our Zen's CPU usage for 50 concurrent connections:


Wow, now that's a big difference. The CPU rarely goes above 80%. That's even better than 10 concurrent connections without our optimization libraries installed! For the record, apachebench reported around 72 requests per second on average. It looks like with our new libraries installed, there's a CPU/requests per second tradeoff.

For 100 concurrent connections:

The stress is noticeably less on the CPU. For the record, apachebench reported almost 75 requests per second on average.

Honestly, I wasn't planning on doing any HTTP tests for the stock+libs Zen as the CPU load isn't too large and therefore there isn't much benefit from it, but just out of curiosity, I did a series of tests for 50 concurrent HTTP connections to see difference in requests per second; it was around 76. That's a 14 requests/second drop. Not sure if I can call that a performance enhancement!

Finally, if you're curious, here are a few benchmarks from our Zen LB to the backend:

root@zen-lb:~# ab -c100 -n10000 http://192.168.0.99:80/?bench.cpp
This is ApacheBench, Version 2.3 <$Revision: 655654 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking 192.168.0.99 (be patient)
Completed 1000 requests
Completed 2000 requests
Completed 3000 requests
Completed 4000 requests
Completed 5000 requests
Completed 6000 requests
Completed 7000 requests
Completed 8000 requests
Completed 9000 requests
Completed 10000 requests
Finished 10000 requests


Server Software:        G-WAN
Server Hostname:        192.168.0.99
Server Port:            80

Document Path:          /?bench.cpp
Document Length:        425984 bytes

Concurrency Level:      100
Time taken for tests:   37.843 seconds
Complete requests:      10000
Failed requests:        0
Write errors:           0
Total transferred:      4262600000 bytes
HTML transferred:       4259840000 bytes
Requests per second:    264.25 [#/sec] (mean)
Time per request:       378.429 [ms] (mean)
Time per request:       3.784 [ms] (mean, across all concurrent requests)
Transfer rate:          109999.38 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    7  12.2      6     221
Processing:    37  335 558.7    217    8559
Waiting:        1   71 232.8     17    1659
Total:         37  342 559.0    225    8574

Percentage of the requests served within a certain time (ms)
  50%    225
  66%    250
  75%    293
  80%    328
  90%    413
  95%   1172
  98%   1786
  99%   3626
 100%   8574 (longest request)

root@zen-lb:~# ab -c50 -n10000 http://192.168.0.99:80/?bench.cpp
This is ApacheBench, Version 2.3 <$Revision: 655654 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking 192.168.0.99 (be patient)
Completed 1000 requests
Completed 2000 requests
Completed 3000 requests
Completed 4000 requests
Completed 5000 requests
Completed 6000 requests
Completed 7000 requests
Completed 8000 requests
Completed 9000 requests
Completed 10000 requests
Finished 10000 requests


Server Software:        G-WAN
Server Hostname:        192.168.0.99
Server Port:            80

Document Path:          /?bench.cpp
Document Length:        425984 bytes

Concurrency Level:      50
Time taken for tests:   36.863 seconds
Complete requests:      10000
Failed requests:        0
Write errors:           0
Total transferred:      4262600000 bytes
HTML transferred:       4259840000 bytes
Requests per second:    271.28 [#/sec] (mean)
Time per request:       184.313 [ms] (mean)
Time per request:       3.686 [ms] (mean, across all concurrent requests)
Transfer rate:          112924.62 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    2   4.3      0      83
Processing:    27  169 338.1     84   11517
Waiting:        3   66 227.5     19    3014
Total:         27  171 338.3     85   11517

Percentage of the requests served within a certain time (ms)
  50%     85
  66%     97
  75%    111
  80%    129
  90%    305
  95%    363
  98%   1357
  99%   1545
 100%  11517 (longest request)

root@zen-lb:~# ab -c10 -n10000 http://192.168.0.99:80/?bench.cpp
This is ApacheBench, Version 2.3 <$Revision: 655654 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking 192.168.0.99 (be patient)
Completed 1000 requests
Completed 2000 requests
Completed 3000 requests
Completed 4000 requests
Completed 5000 requests
Completed 6000 requests
Completed 7000 requests
Completed 8000 requests
Completed 9000 requests
Completed 10000 requests
Finished 10000 requests


Server Software:        G-WAN
Server Hostname:        192.168.0.99
Server Port:            80

Document Path:          /?bench.cpp
Document Length:        425984 bytes

Concurrency Level:      10
Time taken for tests:   38.634 seconds
Complete requests:      10000
Failed requests:        0
Write errors:           0
Total transferred:      4262600000 bytes
HTML transferred:       4259840000 bytes
Requests per second:    258.84 [#/sec] (mean)
Time per request:       38.634 [ms] (mean)
Time per request:       3.863 [ms] (mean, across all concurrent requests)
Transfer rate:          107746.80 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.1      0       2
Processing:    23   38   5.7     38      77
Waiting:        4    9   2.6      9      38
Total:         23   39   5.7     38      78

Percentage of the requests served within a certain time (ms)
  50%     38
  66%     40
  75%     42
  80%     43
  90%     45
  95%     48
  98%     53
  99%     57
 100%     78 (longest request)