John Fremlin's blog: How to optimise Common Lisp under SBCL: don't profile (draft)

Posted 2010-01-10 13:55:00 GMT

Don't worry about performance, then profile and look at the hotspots, changing your algorithm or tweaking them. I hear variations of this frequently and I completely disagree if performance is actually important. However, in the case where you haven't any strong performance requirements, it might let you make your program faster without much effort.

Firstly, profilers (unless part of an emulator) tend to produce results that can be very misleading if you don't take into account how they work. SBCL's instrumentation profiler is particularly prone to this, and its statistical profiler shares issues with all statistical profilers.

Secondly, once you find out from the profile that function #'process-stuff is taking a lot of time you should figure out whether or not it is realistic to expect to make it much faster, before buckling down to do something about it (see planning). And profiles reports don't tell you directly that you are using the wrong datastructure, especially if you have inlined all the inefficient accesses to it.

I think it is better to plan a good architecture rather than try to catch impressive sounding speedups by tweaking hotspots. If you can make your program twice as fast, how much faster is it possible for it to go? How fast does it need to go? A profile report doesn't do much towards addressing these questions.

Imagine building a house and wanting to make it energy efficient. You could look at it through a supers-duper infrared camera, and then double glaze the windows where heat is leaking in or out. But seeing as you're building it from scratch, why not consider energy efficiency from the start? After you've built it, you shouldn't be able to see any heat disparities with your amazing camera. Analogously, the profile report should just be a way to check for performance bugs, not a starting point.

You should be able to figure out the algorithms and structures that are important for the performance of your program without the profiler. The profiler can help build evidence to support or disprove your hypotheses, but is no substitute for thinking about the performance problem holistically, and blind reliance on the profiler can lead to ignore important algorithmic and data-structural changes and even to fiddle with things that are utterly irrelevant.

In most Lisps, I like using the handy and portable time macro with benchmarks of small parts of a program. It generally gives a reasonable picture of the amount of processing time and amount of garbage generated.

Under SBCL, it reports cycle counts (from the RDTSC instruction), which are nice to see as they remind you how much processing your algorithm uses. However, modern processors have a constant frequency for this clock independently of the frequency at which the CPU cores may be clocked. Therefore if your CPU is scaling back to save power, your cycle counts will be not related to CPU cycles. The cycle counts are just a measure of elapsed wall-time in units (hopefully but not necessarily) of a processor cycle. That means, for example, that other applications running on the computer may affect the cycle count.

The run-time, gathered from getrusage, and GC statistics are for every thread in the Lisp process, not just the things you are running inside the time macro.

Running under SLIME, a constant stream of garbage is being generated in the SLIME thread, so don't worry about small amounts of garbage (measured in the "bytes consed" line), even if you think your program should generate no garbage.

CL-USER> (time (loop repeat 500000000))
under SLIME
Evaluation took:
  0.259 seconds of real time
  0.260000 seconds of total run time (0.260000 user, 0.000000 system)
  100.39% CPU
  466,290,369 processor cycles
  12,608 bytes consed
and at the console
Evaluation took:
  0.258 seconds of real time
  0.260000 seconds of total run time (0.260000 user, 0.000000 system)
  100.78% CPU
  463,174,965 processor cycles
  0 bytes consed

Note that using (time ...) has some measurable overhead, so repeat your benchmark in a loop inside (time ...) until it takes at least a significant fraction of a second, and the results seem stable.

In summary, the results are a rough guide, not a perfect cut-and-dried final answer, but I recommend using (time ...) to make experiments on the performance of components of your program and to measure the effectiveness of any optimisations you take. It is less intrusive than the profilers, much simpler and much less misleading.

To be continued . . . (some examples of the misleading results from the profilers).

Post a comment