Posted 2009-12-21 11:43:00 GMT
Planning the architecture — Your typical software
project fulfils many functions. Before starting to optimise, you have
to decide which functions you wish to make faster. For example, in
tpd2 you can define new URL (slowly) and request it over HTTP
(quickly). Once you have decided on a set of performance requirements,
you can consider whether it is possible to achieve it with your
system. This means you need to figure out a realistic numeric value
for the performance requirements. Saying
fast as possible
probably means that you have no way of quantifying the benefit of faster
performance and so are wasting your energies pointlessly
optimising something that doesn't need to go fast.
Hopefully, your existing design already meets your performance requirements. In that case, fantastic. You don't need anything more. Implement it knowing you needn't worry about performance.
For an example of a sensible requirement that can invalidate a design: you want to serve 10k requests/s, and it takes around 300us to start a new thread. Then you cannot have one new thread for each request. There is no point starting out with a design that is based around one thread for each request.
You have to estimate the minimum amount of actual work your approach will require. This requires common sense or experience. Is the bottleneck resource going to be network bandwidth? Disk seeks? Division operations on the CPU? JPEG image decodes? Is there enough RAM?
A very common source of unnecessary work is in changing from one data representation to another at module boundaries (for example, YUV to RGB image conversions). Often it can dominate the actual, useful work. When I was hacking on embedded video codecs (H.264 etc.), simply copying and converting took a large slice of processing. The same with Unicode conversion in early versions of tpd2.
If you are unsure about what is important and want to test your theories, then run a few mock-up experiments with made up data. The time macro is very helpful for this. For example,
CL-USER> (let ((s (make-string 1000000 :initial-element #\x))) (time (sb-ext:octets-to-string (sb-ext:string-to-octets s :external-format :utf-8) :external-format :utf-8)) nil) Evaluation took: 0.146 seconds of real time 0.150000 seconds of total run time (0.150000 user, 0.000000 system) [ Run times consist of 0.040 seconds GC time, and 0.110 seconds non-GC time. ] 102.74% CPU 260,730,414 processor cycles 21,773,968 bytes consed
So it takes approximately 260 cycles per character to convert from Unicode to UTF-8 and back (including GCing the resulting mess). Suppose each request involves 1kB of data; then we have to process 10MB/s of data. If we convert to UTF-8 and back with SBCL's slow string conversion, we will use 2.6 GHz just doing that. So we cannot afford to do it, and we can figure that out before barrelling ahead down an impossible path.
PS. If you want to run more automated experiments, you can get cycle counts with this macro (be careful to repeat tests until you are confident that the results are statistically significant).
(defmacro with-returning-cycles (&body body) (alexandria:with-unique-names (h0 l0 h1 l1) `(multiple-value-bind (,h0 ,l0) (sb-impl::read-cycle-counter) (locally ,@body) (multiple-value-bind (,h1 ,l1) (sb-impl::read-cycle-counter) (sb-impl::elapsed-cycles ,h0 ,l0 ,h1 ,l1)))))
Post a comment