Since all current CPU in new servers are able to run in 64 bits mode, and most Linux distribution are also available in 64 bits version, I was wondering why we still continue to install 32 bits operating systems ?

So, I took two Dell Poweredge 1950 with two Intel Pentium 4 Xeon quad-core 2 GHz with 2 Go of Ram and installed the Debian 4.0 (Etch) Linux distribution on both servers. On the first one, the usual i386 version is installed; on the second, the amd64 version.

AMD Athlon 64 bit

Do not confuse EM64T with IA-64: IA-64 is the name hof an architecture used (mostly unused now) for Itanium processors, developped by Intel in collaboraion with HP. EM64T is the 64 bit intruction set that you get when you buy a server with an Intel processor. Officialy, it is different from AMD’s. In fact, really not much: apart from subtle differences in instruction sets, and one which implements 3DNow, and the other SSE3, they are almost compatible.

The tests:

The tests were performed in two areas, where high CPU performance is required: database and SSL requests, and in two forms, a generic benchmark, and a home-made test, which aims to to be more representative of real-life:

  • First, Mysql performance, ussing sql-bench (provided by Myql) and secondly, SQL stats requests with a 1 507 500 entries database extracted from a production Mysql Radius log server.
  • Next, a SSL performance test, with Openssl built-in speed test, followed by a (quickly) home-made script which made an incremental number of concurrent HTTPS requests on a 14K text file, and measures the time spent.

Testing methodology:

  • sql-bench (only the final result is represented):
  Seconds usr sys cpu tests
x86 741.00 209.91 52.16 262.07 3101541
AMD64 697.00 200.73 47.11 247.84 3101541

This is a 6% increase.

  • MySQL stats requests on Radius logs database:

This test was composed of stats requests on a big database with more than 1 500 000 entries (of authentication logs), that may take up to a few hours on production server. On the 64 bit version, the performance gain was between 7 and 9%.

  • OpenSSL speed test:

The openssl speed command produces the results which are partially reproduced bellow (and can be fully downloaded here). The following table represents the how many bytes per second can be processed for each algorithm :



x86 AMD64
md5 16 B 15545.38 17522.78
64 B 53533.36 59208.34
256 B 147117.06 156269.54
1024 B 263072.06 268992.85
8192 B 346791.94 340670.17
sha1 16 B 15871.30 16940.88
64 B 51804.36 52854.33
256 B 133061.97 129182.46
1024 B 216565.45 202151.25
8192 B 267272.19 239897.69
rc4 16 B 190981.69 210924.71
64 B 206862.82 228013.25
256 B 242711.81 234577.24
1024 B 243926.33 234083.00
8192 B 187067.05 234422.27
des cbc 16 B 44435.86 35326.76
64 B 45948.19 37143.30
256 B 46544.04 37248.51
1024 B 46392.98 36891.56
8192 B 46533.29 36806.66
des ebe3 16 B 16987.46 13823.47
64 B 17193.93 14062.53
256 B 17318.91 14128.98
1024 B 17249.13 14097.86
8192 B 17317.89 14101.16
blowfish cbc 16 B 72394.74 64395.70
64 B 76733.30 69898.14
256 B 78098.43 71347.97
1024 B 77994.44 71499.09
8192 B 78460.25 71496.29
aes 256 cbc 16 B 60277.60 78290.24
64 B 73271.71 81663.38
256 B 77612.20 81707.46
1024 B 77750.18 81189.89
8192 B 79151.10 81781.20
sha256 16 B 8619.05 11062.15
64 B 20638.02 27701.01
256 B 37677.31 52057.09
1024 B 47335.34 67080.53
8192 B 51524.95 72832.60

And the second one, the number of sign and verify cycles per second:


sign/s verify/s
x86 AMD64 x86 AMD64
rsa 512 b 1694.4 3430.2 22094.7 45158.5
1024 b 389.5 942.4 8539.2 17302.3
2048 b 72.8 167.0 2804.6 5734.0
4096 b 12 26.100 857.4 1711.6
dsa 512 b 2367.2 5464.9 1944.7 4701.1
1024 b 897.6 2002.8 740.2 1675.0
2048 b 285.2 631.4 234.9 517.8

Note that these tests were made using only one core.

In 64 bits, the asymmetric functions (RSA and DSA) perform clearly better. For the symmetric cyphers, there is not much difference, and the 32 bits mode tends to beat 64 bits. It is important to notice that asymmetric algorithm (used in public-key cryptography) are really CPU intensive.

  • HTTPS concurrent requests:

For this test, I made a script which made an incremental number of concurrent HTTPS requests (downloading a 14 K text file) and logs the time needed to answer. The TLS/SSL ciphers used was the default one: DHE-RSA-AES256-SHA.

This aims to be a more realistic test and the AMD64 version is always about 15-20% faster, even under high load. We can explain this result by the fact that the key exchange at the beginning of the SSL negotiation uses asymmetric ciphers, so the 64 bits mode outperforms easily the 32 bit mode. The results may have been less contrasted for longer sessions with more data downloaded.

Final Thoughts:

In some case, the AMD64 performs better than the same CPU in 32 bit mode. However, the AMD64 (or IA64) architecture has some limitations: if you intend to use software distributed as binary-only, it must be available for this architecture. If this should not be a big problem for most servers, it might be a problem for a workstation (flash player for example). Moreover, 64 bits numbers requires theoretically more memory, since pointer and long data types uses twice more memory, but this should not have a big impact in practice. And in 64 bits mode, OS can natively access more memory.

To conclude, you should seriously consider installing an AMD64 distribution on your new servers (whatever it is Intel or AMD processors) if they are dedicated to task where CPU performance matters.



Update: This article is quite outdated, and now 64bits can lead to big improvements. For example, search on Splunk 64bits can be be up to ten times faster compared to 32bits according to some Splunk people (I’ve not been able so far to test by myself).

August 29, 2007, 9:07 am lock

Add your own comment or set a trackback

Currently no comments

  1. No comment yet

Add your own comment

*
To prove you're a person (not a spam script), type the security word shown in the picture.
Anti-Spam Image



Follow comments according to this article through a RSS 2.0 feed