John Fremlin's blog: How to optimise Common Lisp under SBCL: object pools

Posted 2009-12-21 11:41:00 GMT

Suppose your program is spending a long time in the garbage collector. This issue is likely to affect any high throughput Lisp program, unless carefully designed to avoid it.

The best way to reduce pressure on the GC and the memory hierarchy is to avoid using so much memory. If you have run out of ideas for changing your algorithm, the next best thing is to use dynamic-extent allocation to put object on the stack. However, this is not always convenient or even possible.

One workaround when you have a good idea of the object lifetime is to make an object pool. Instead of freeing objects, keep them referenced in a pool. When you need a new one, request it from the pool and only allocate if empty. This seems like it should reduce the GC pressure and it often does.

Note however, that on SBCL it may sometimes make your program slower. Here is a program that demonstrates this.

(defmacro with-returning-cycles (&body body)
  (alexandria:with-unique-names (h0 l0 h1 l1)
   `(multiple-value-bind (,h0 ,l0) 
      (multiple-value-bind (,h1 ,l1)
	(sb-impl::elapsed-cycles ,h0 ,l0 ,h1 ,l1)))))
(defun make-buf ()
  (make-array 20))
(defun touch-buffers (n)
  (let ((buffers (loop repeat n collect (get-buf))))
    (let ((x (random #x100)))
      (loop for a in buffers
	    do (setf (aref a 0) x)))
    (mapc 'drop-buf buffers)))
  (defvar *pool* nil)
  (defun get-buf ()
    (or (pop *pool*) (make-buf)))
  (defun drop-buf (buf)
    (push buf *pool*)))
  (defun get-buf ()
  (defun drop-buf (buf)
    (declare (ignore buf))))
(defun benchmark (function &key (count 1000) (repeats 5))
  (let ((totals (make-array repeats :initial-element 0)))
    ; warm up
    (loop repeat repeats do 
	  (funcall function))
    (loop repeat count
	  (loop for i below repeats do
		(incf (aref totals i) (with-returning-cycles 
		   (funcall function)))))
    (map 'vector (lambda (x) (round x count)) totals)))

Using the non-pooled version, the cycle counts stay fairly consistent though the first run after a GC is slightly slower.

CL-USER> (gc :full t) (benchmark (lambda () (touch-buffers 10000)))
#(4335098 3977324 4140443 4134816 4212069)
CL-USER> (gc :full t) (benchmark (lambda () (touch-buffers 10000)))
#(4111324 3750190 3726014 3897913 4096875)

However, with the pooled version, the first run after a GC suffers numerous page faults marking the pooled objects in oldspace non-write-protected. So the first run is twice as slow as the naive version! Of course, later runs are much faster.

CL-USER> (gc :full t) (benchmark (lambda () (touch-buffers 10000)))
#(7105889 1335499 1268755 1283744 1298589)
CL-USER> (gc :full t) (benchmark (lambda () (touch-buffers 10000)))
#(8959631 1586089 1519643 1525521 1546846)

This post demonstrates the need for measuring whether your optimisation is beneficial in a real test case, and that you might need to consider the effects of oldspace write protection. (Having a LIFO pool mitigates this to a certain extent, and is probably a good idea from a memory hierarchy perspective provided there is one pool per core.)


Post a comment