John Fremlin's blog: The Klone webserver now the winner with 9k requests/second

Posted 2009-06-24 07:35:00 GMT

After all the interest in teepeedee2, a few other projects came out of the woodwork claiming to have better performance. They didn't. But the very interesting Klone webserver from Koan Logic does.

It uses language for dynamic pages that looks similar to PHP, but is actually more or less inline C code, which is compiled to native code. This means it can achieve the same sort of performance as tpd2, which also compiles to native code. In fact, it can get much better performance — I measured 9k requests/s.

To stop extra unnecessary forks, I added the following to kloned.conf:

prefork.max_requests_per_child  100000001

I started kloned like this:

$ schedtool -a 0 -e ./kloned
(I also tried the -F flag but that made the performance worse for some reason.)

$ schedtool -a 1 -e ab -n 100000 -c10 http://localhost:8080/?name=John
This is ApacheBench, Version 2.3 <$Revision: 655654 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking localhost (be patient) Completed 10000 requests Completed 20000 requests Completed 30000 requests Completed 40000 requests Completed 50000 requests Completed 60000 requests Completed 70000 requests Completed 80000 requests Completed 90000 requests Completed 100000 requests Finished 100000 requests

Server Software: klone/2.2.0prv4 Server Hostname: localhost Server Port: 8080

Document Path: /?name=John Document Length: 23 bytes

Concurrency Level: 10 Time taken for tests: 11.016 seconds Complete requests: 100000 Failed requests: 0 Write errors: 0 Total transferred: 22800000 bytes HTML transferred: 2300000 bytes Requests per second: 9078.00 [#/sec] (mean) Time per request: 1.102 [ms] (mean) Time per request: 0.110 [ms] (mean, across all concurrent requests) Transfer rate: 2021.27 [Kbytes/sec] received

Connection Times (ms) min mean[+/-sd] median max Connect: 0 0 0.0 0 8 Processing: 0 1 1.1 1 66 Waiting: 0 1 1.0 1 66 Total: 1 1 1.1 1 66

Percentage of the requests served within a certain time (ms) 50% 1 66% 1 75% 1 80% 1 90% 1 95% 1 98% 1 99% 1 100% 66 (longest request)

Strangely, the exact same run before only managed a slower total than tpd2. Guess it's related to the new moon.

Requests per second:    5739.27 [#/sec] (mean)

The KL1 language used by kloned is quite low-level. To do the simple Hello benchmark, Stefano Barbato generously guided me through the process of getting kloned up and running (there were a few build issues he quickly resolved) and came up with this script:

<%
    enum { MAX_NAME_LEN = 100 };
    char enc_name[5*MAX_NAME_LEN + 1] = "no name given";

/* get the 'name' value */ const char *name = request_get_arg(request, "name");

/* if 'name' is set and it's short enough to fit in 'enc_name', encode it */ if(name != NULL && strlen(name) <= MAX_NAME_LEN) u_htmlncpy(enc_name, name, strlen(name), HTMLCPY_ENCODE); %>

<h1>hello <%= enc_name %></h1>

This is compared to the smaller tpd2 syntax, where such a page is

(defpage "/hello" (name) :create-frame nil
  (<h1 "Hello " name))

and the :create-frame nil is optional (it just says that, as an optimisation, a new session or frame should not be created to keep track of the user).

However, the benchmark is not about these subjective aesthetic considerations, and kloned can achieve much faster rates than teepeedee2 for the simple Hello ${name} test according to my figures.

The architecture is rather interesting and, in fact, not what I expected. I assumed that to make a high performance webserver one needed to handle many connexions in a single thread and therefore to use epoll and transform the network protocol handling into a state machine (with continuation passing style this is quite elegant).

However, klone does not do this. Each process keeps handling one HTTP connexion at a time. There are a number of options for handling more connexions. For example, in SERVER_MODEL_PREFORK if a request takes longer than one second then another child can be spawned automatically.

The normal old style Apache model is to handle a single request or single HTTP session in one process and to exit the process at the end. Simply keeping it open for subsequent clients seems obvious in hindsight and it provides high performance on this simple benchmark. If your code is not a stateful mess, then it's a great idea. Kloned compromises a little by restarting fresh after a default of 10k requests (I raised this limit for the benchmark).

The strength of teepeedee2 is that it is very competitive in speed, has the ability to keep many concurrent long-term connexions open to do the AJAX push for comment updates among other things, and can leverage the power of functional and meta-programming to deliver innovative applications. Looks like I need to make a new benchmark :-)

PS. Something not relevant to the benchmark but of note is that the KL1 script contains a buffer overflow vulnerability (the call to u_htmlncpy unsafely does not take the length of the output buffer, and the expansion of " is &quot; which is 6 not 5 characters).

PPS. The sarcastic low-level language bashers needn't think this proves anything because the bug is obvious to any serious C programmer.

UPDATE 20090809 — teepeedee2 is now quite a bit faster than kloned.

plus one!

Posted 2009-06-29 13:02:32 GMT by lor

Post a comment