John Fremlin's blog: Building your own checksum and listening

Posted 2009-03-21 07:14:00 GMT

A contractor at a company I used to work for was tasked with testing our audio codecs. He wanted to make a checksum of the PCM (raw .wav) output of decoders to check that they consistently gave the same output, without storing all the output.

To do this he proposed building his own checksum.

He asked for input on building his checksum algorithm. I suggested using the CRC-16 polynomial checksum function I'd made. He didn't want to do that, and decided to use something with XOR. We pointed out that this might not be a good idea. If you make your own checksum with simple functions, it can get stuck and the right kind of data can cancel out and cause it to give a constant output. In fact, that had just happened to me and was why I had made the CRC-16 routine.

I was the youngest person at the company, so one might forgive him for not listening to me and my experience. However, one of the most senior and respected engineers there suggested that if you really wanted to build your own checksum, then use addition, multiplication and reduction (mod p) where p was a prime. He wasn't listened to either.

Programmers are an opinionated bunch. It's practically impossible to change their minds once they've settled on an idea. Trying to teach the functional style of programming to determined PHP users simply doesn't stick, they don't want to get it. Similarly, trying to introduce the meta-programming style to Java programmers (who can generally appreciate functional programming quite easily) just doesn't work.

Take something more straightforward and important like whether features or stability should be prioritised in a product. The boss can shout, he can scream, he can write angry emails. Even the biggest boss can shout and scream, but the programmer will by and large simply ignore him. The boss is powerless. The programmer's course is fixed. It cannot be changed. He can be stopped but he cannot be turned.

The contractor spent a few days investigating why his simple bit-shift and XOR returned a checksum 0 for several streams. After about a week he figured out that if there was enough silence in the audio, then the checksum would go to zero.

Learning through failure wastes a lot of time.

Post a comment