John Fremlin's blog: How to optimise Common Lisp under SBCL: builtins

Posted 2009-12-23 23:00:00 GMT

Common Lisp exports nearly a thousand symbols. There are numerous builtin functions, and often one function performs multiple tasks. For example, the reduce function does either a left fold (foldl) or a right fold (foldr) depending on its arguments, and operates on all sorts of sequence types. It is hardly surprising that many of these myriad cases do not have efficient implementations.

In the ideal world, employing builtin functions would be good for performance as well as clarity. This happily is often the case in the real world. For example, using replace with well declared parameter types can be nearly an order of magnitude faster at copying bytes than a loop on declared simple-arrays in Lisp with safety 0.

On the other hand, you may come across nasty surprises. For example, the assoc function does not inline its :test parameter. As a full function call is quite flexible in Lisp, it is important to avoid them in tight loops. For a micro-benchmark,

(declaim (inline inline-assoc))
(defun inline-assoc (item alist &key (test #'eql))
  (declare (optimize speed))
  (loop for cons in alist
	for (a . b) = cons
	do (when (funcall test item a) (return cons))))
 
(defun assoc-test (item alist)
  (declare (optimize speed))
  (loop for i below 5 thereis 
	(assoc item alist 
	       :test (lambda (x y) (= (+ x i) y)))))
(defun inline-assoc-test (item alist)
  (declare (optimize speed))
  (loop for i below 5 thereis 
	(inline-assoc item alist 
		      :test (lambda (x y) (= (+ x i) y)))))

In this case the inline-assoc-test is nearly three times faster and, very importantly, does not cons

CL-USER> (let ((alist (list (cons 3 'found)))) (time (loop repeat 1000000 do (assoc-test 1 alist))))
Evaluation took:
  0.225 seconds of real time
  0.230000 seconds of total run time (0.200000 user, 0.030000 system)
  [ Run times consist of 0.050 seconds GC time, and 0.180 seconds non-GC time. ]
  102.22% CPU
  404,691,552 processor cycles
  112,013,248 bytes consed

The inline version is much more efficient. The fact that it does not cons at all makes a huge difference to the asymptotic complexity as the throughput increases.

 
CL-USER> (let ((alist (list (cons 3 'found)))) (time (loop repeat 1000000 do (inline-assoc-test 1 alist))))
Evaluation took:
  0.076 seconds of real time
  0.060000 seconds of total run time (0.060000 user, 0.000000 system)
  78.95% CPU
  136,060,587 processor cycles
  0 bytes consed

What does this mean in practice? Try to use the builtin functions first, as sometimes they are much more efficient than what you can easily write yourself in portable Lisp. But if one of your critical functions is unexpectedly slow, don't get stuck with the assumption that the calls to the builtins are fast. Trust, but verify.

UPDATE 20100109: fix bug spotted by Nathan Froyd.

Post a comment