John Fremlin's blog

ToqPiq: XKCD comics on the Qualcomm Toq smartwatch

Posted 2014-11-02 23:44:15 GMT

During the YC Hacks Hackathon, I made an app for the Qualcomm Toq SmartWatch that shows XKCD comics on the watch. It's pretty rough and ready.

Sad that the API from Qualcomm does not include more functionality (doesn't even allow arbitrary aspect ratios for the photos shown on the watch!). I've thrown the hack on GitHub, have fun!

Post a comment

How to scale a consumer web product launch (load shedding)

Posted 2014-11-01 22:02:55 GMT

The situation: you've come up with a cool new app, have beta-tested it for a while and are ready to open it up to the public. How much server capacity will you need? There's no way to estimate the loadspike - one hit post on reddit or a partner promotion could get you 100k uniques over a few hours and if you've got the killer app you think you have that might just be the start of the onslaught. To control the load, you can have an invite flow where you gather customer contact information and then notify them that they can use the product, but this is likely to cause more of a drop-off than immediately wowing people.

There are three engineering problems of scale here: the obvious one is making the software you run efficient - modern servers can handle 10k+ rps per core - for efficient requests. This requires monitoring and measurement. The second obvious one is ensuring a good asymptotic response to additional hardware resources - adding servers should make it possible to serve more customers, but if you have a single database shard that is authoritative then adding more API servers might not help at all. Again, monitoring and measurement are essential to verify that the characterization of the system fits the model you have of it under the real load it experiences.

The third problem is making sure that in the event of a failure of the scale plan, that the user experience is still professional. For example, if things are overloaded or broken, disable marketing campaigns and notifications drawing people to the product. This may seem obvious but its easy to forget about Adwords when product is down.

The minimal example of a load shedding UX is the famous Twitter fail whale - a static landing page (served perhaps from separate reliable infrastructure) that you can redirect people to. Shifting load can actually work well for highly motivated users: if someone really wants your product then they will come back tomorrow, so downtime mostly loses the chance to convert people who were just curious into customers. To try to minimize the loss, the landing page can ask for contact information so that you can notify people when the service is fixed. Having a count of customer sign-ups might help turn negative publicity around downtime into a story about popularity.

Depending on the product, it may be possible to implement a less debilitating load shedding: request fewer results from the server, do less ranking, depend on client-side caches and reduce polling frequency. Plan for this by changing client code to not retry aggressively and have server-side timeouts on queued requests: if a request is in a queue for longer than the customer is likely to wait for it, just fail it rather than spending compute resources.

Even with a good plan, and a well characterized system whose response under load is well understood, an influx of publicity will bring a new class of semi-malicious or malicious use. People will poke around at your API and use unexpected classes of devices to access the product. This is one example demonstrating that auto-scaling can be a really bad idea: if someone is DDoSing you, then adding scale to handle that just makes the DDoS more expensive. Some mischievous attackers specialize in finding poisonous requests that will consume many server resources while very few client ones. Be ready to react with reliable dashboards to give you the information you need, and do not depend on automatic systems. A load spike can turn into a delightful opportunity to exercise your disaster recovery plan when servers overheat and lock up :)

Post a comment

Redundant video bitstream

Posted 2014-10-22 17:48:58 GMT

Video has the most computationally heavy compression in widespread use. At the same time, layer after layer of abstraction with a view to incredible flexibility, where each group dreams of a container format that will carry any codec, leads to an incredible duplication of metadata.

For example, how many times is the video resolution encoded in an MP4 file with a typical MPEG4 AVC payload? The instinctive reaction of each group of architects at each point in the stack is to redefine the description of the video in some slightly inadequate schema that is not quite a superset of the others, so all are needed. For VOD or realtime video, the resolution is eagerly repeated again and again in each extra transport layer. Certainly for a standard MP4 stream, the resolution must be in the H.264 Sequence Parameter Set used by the video codec. But then also in the avcC box - the configuration of that codec in the MPEG4 stream format, and then also in the track header tkhd box. So I count at least three. Are there more?

Why is this bad? The information is unnecessary, in aggregate over the billions of videos made in the world it wastes huge amounts of not only video storage but mental bandwidth in standards texts, and a video file can now be created that is inconsistent so every implementation needs error handling for this case.

This is, I suppose, an example of a sort of bikeshedding: an issue so trivial (the resolution of a video is easily understood) so that each abstraction is eager to take responsibility for it, whereas real practical issues, like the relative placement of the indices inside the file, are left to fall through the cracks.

Post a comment

Activating the Dynamixel MX-28T actuator

Posted 2014-07-29 16:59:37 GMT

In our quest to make a robotic hand, we procured an awesome actuator, the Dynamixel MX-28T.

This marvelous device can not only spin but also go to a precise goal position, and hold it against varying torque, using its mighty 72MHz processor to adjust as necessary.

We wanted to use the usb2ax to control it. To our surprise this is actually impossible out of the box, as oddly enough, the MX 28T is programmed to start off listening at 57142.9 baud and the usb2ax can only use 1MHz. This is all the more bizarre because the base clock serial clock of the MX 28T is 1MHz or 2MHz depending how you think about it.

I guess we could have tried to do something cool with a serial port as this is quite close to 57600, but we didn't and we got another controller to program the poor MX-28T which lives on a nice bus to change register 4 to value 1 so we could talk to it.

Then everything was hunky dory. The servo is surprisingly strong!

Post a comment

The mmap pattern

Posted 2014-04-30 16:39:01 GMT

1 watching live

There are many choices in software engineering that are visible only to the developers on the project: for example, the separation of responsibilities into different parts of the program are (hopefully) invisible to the user. These choices can descend into a question of personal taste and I personally lean towards simplicity. My experience has shown that people tend to create complex generic interface hierarchies, only to have them hide a single implementation. What's the point in that? On the other hand, there are architectural decisions that affect the way the program behaves. For example, whether processing occurs on a server or mobile device ends up changing how the system can be used.

I want to bring up an underused architectural choice: the defined memory layout persisted to disk by the operating system via mmap. All malloc memory on modern UN*X systems is mmap'd. But explicitly choosing a file and mapping that into the memory space of the program means it can persist across crashes. And it doesn't just persist in terms of storage on a backing device, but the OS is aware the pages are mapped into memory, so on start-up there will be no disk read, no serialization delay and everything is ready. In many systems the effort of reading in data, whatever trivial transformation is being performed, can be extremely costly in terms of CPU time (instructions and dcache thrashing). With a mmap'd file, this time can be reduced to nothing.

One very key architectural decision for a system is the degree of reliability that it should possess. This is an explicit trade-off between the rapidity of development (in particular the level of indoctrination needed before new contributors are able to augment the feature set) and the operational stability. By preserving state explicitly to memory backed files, several classes of unexpected events causing the program to crash can be recovered from with minimal disruption. The benefit here is that the system can be developed rapidly with code reviews focusing on data integrity and paying less attention to complex interactions that can lead to crashes.

Modern CPUs have the ability to arbitrarily (generally at a 4kB granularity) map memory addresses as visible to a program to physical RAM addresses. The technique I am advocating here is a way of exploiting this hardware functionality in conjunction with operating system support via the mmap call (that turns a file into a range of memory addresses). It is quite possible to share mmap regions across processes so this gives a very high bandwidth unsynchronized interprocess communication channel. Additionally, the regions can be marked read-only (another nice capability afforded by CPUs) so data-corruption failure cases can be avoided entirely.

The main implementation difficulty with using a mmap'd region is that pointers into it must be relative to its base address. Suppose one were to try to persist a linked list into such a region. Each next pointer in the list is relative to the base address of the region. There are multiple approaches: store the base address in a separate (maybe global) variable, and add it each time, or try to mmap each region to a well-known start address (and fail if it cannot obtain that address). Generally with some trickery it is possible to exploit the memory remapping capabilities of the underlying CPU to reduce this overhead (i.e. store the base offset in a segment or other register). Each of these alternatives has advantages and disadvantages which can be debated; in practice, once the idea of persisting state to mmap files is introduced into an architecture, there are various reasons to try to use multiple regions (e.g. to support fixed-sized and non-fixed-size records, and to enable atomic exchange of multiple versions of the state).

Though there are low-level opportunities, this technique can actually be extremely beneficial in garbage-collected scripting languages where dealing with large amounts of data is generally inefficient. By mmap'ing a region and then accessing into it, the overhead of creating multitudes of small interlinked items can be reduced hugely. Large amounts of data can be processed without garbage collection delays. Additionally, the high cost of text-processing can be paid just once, when first building up the data-structure and later manipulations of it can proceed very rapidly, and interactive exploration becomes very convenient. The instant availability of data can reinvigorate machine learning work-flows where iteration speed from experiment to experiment is a constraining factor.

Despite the advantages, this technique is not widely exploited, which is why I'm writing about it. For Lisp there is manardb, and in industry there are several very large systems of the order of petabytes of RAM which use this idea heavily. Consider it for your next project!

Damn, this was trivial using PL/1 on Multics back in the mid 70s.

Posted 2014-05-01 05:56:25 GMT by Anonymous from

MSVC has support for __based pointers, maybe there is something like this in gcc / clang?

Posted 2014-05-01 07:11:48 GMT by Dimiter "malkia" Stanev

With modern 64/48 bit memory space, could the application choose *exactly* where to map the file, thereby avoiding the need for base pointers or other fixups.

Posted 2014-05-01 09:13:15 GMT by Chris Dew

"pointer swizzling" is one term for the fixup technique. The Apple Newton's OS is the only major user I can think of, what are some others?

Posted 2014-05-01 10:08:23 GMT by abrasive

Good read. I'm a big fan of using mmap(). One thing to note is that you need to be careful with your file formats so they are aligned properly.

A good way to work around this is to preprocess files for the specific architecture it's currently on. Key points to keep in mind are native type width, alignment.

Posted 2014-05-01 11:22:24 GMT by Steven


There are multiple approaches: store the base address in a separate (maybe global) variable, and add it each time, or try to mmap each region to a well-known start address (and fail if it cannot obtain that address).


ASLR is a conspiracy created by "security" professionals to make programs run slowly. Eschew it~! : )

Posted 2014-05-01 14:15:03 GMT by babycakes

Yes, a standard implementation of mmap can put the memory at a fixed address. It's got a lot of options; see the man page. This doesn't even require more than 32 bits of address space, just some coordination with your linker to ensure that the region you want to use for mmap'd pages is not otherwise used. In fact, this is essentially what the runtime linker/loader does when you load a dynamically linked library.

Posted 2014-05-01 14:41:49 GMT by Anonymous from

"Despite the advantages, this technique is not widely exploited"

Are you sure? Image-based systems such as Smalltalk and some Lisps have been around, if not very common, since the 70s. Your suggestion seems to be very closely related.

"By preserving state explicitly to memory backed files, several classes of unexpected events causing the program to crash can be recovered from with minimal disruption."

This might indeed be true for events such as abrupt power failure. However, if the crash is due to corrupted memory, you would not want to restart from such a damaged state. I don't see that there's an obvious way to differentiate these two cases.

Posted 2014-05-01 15:29:45 GMT by Michael Schürig

Post a comment

Costs of energy from different sources by kWh

Posted 2014-04-24 17:02:09 GMT

How to compare the cost of energy? Each different source is traditionally traded in different units (kWh, therm, gallon). Here I take the most recent information from the Bureau of Labor Statistics for San Francisco, Oakland, San Jose (the Bay Area) and digest it into kWh.

Energy source Cost per kWh in USD cents Relative cost Efficiency estimate Cost per usable kWh Notes
Natural gas (utility) 4.5 1 25% (18?) 29.3 kWh per therm, efficiency estimate based on CNG vehicle which is unfair comparison
Petrol (gasoline) 11.1 2.5 25-30% 44.4 33.4 kWh per gallon
Electricity (mains) 22.1 4.9 90% 24.6 Most convenient source of energy!

The cost of obtaining energy from solar installations is becoming competitive with grid electricity but is still much more expensive than other energy sources (particularly natural gas!).

Deciding in practice between which energy source to use is tricky. For example, if you have a plug-in hybrid car, should you charge it from the mains or from a petrol station? That depends on its efficiency per mile traveled from the different sources (normally much higher for electricity) and also on the charging efficiency, as optimistically 10-20% but sometimes even more of the mains charge is wasted and not stored in the battery.

EDIT 20140424: Added estimates of usable energy in terms of efficiency of a motor. For heating, much higher efficiency can be obtained from the cheaper sources like gas (e.g. some people estimate 70-80%).

Post a comment

etckeeper tracking server config

Posted 2014-04-20 07:08:40 GMT

There's an awesome tool to keep UN*X /etc directories under revision control. In theory this is where all the system configuration should be. Of course it tends to leak out, but it's a start :)

One missing piece is the list of installed packages: surely this is the main overview of a systems configuration?

Anyway, that's easy to add to etckeeper, here's the script that I use

set -x
set -e
apt-get install -qy etckeeper git
etckeeper uninit -f
perl -pi -e 's/VCS="bzr"/VCS="git"/' /etc/etckeeper/etckeeper.conf 
cat > /etc/etckeeper/post-install.d/00-vii-etc-package-list <<EOF
#! /bin/sh
etckeeper list-installed > /etc/etckeeper/vii-installed-packages.list
chmod +x /etc/etckeeper/post-install.d/00-vii-etc-package-list
etckeeper init

and it's easy to run on a new server with cat ~/Junk/ | ssh root@newserver bash -s

Of course one could set up chef or puppet or something but with just a handful of machines the goal of better configuration management is not automation but clarity. This is a debugging tool so you can figure out what happened when something broke and potentially helpful for reverting.

Post a comment

Rotating an image with OpenCV and Python

Posted 2014-04-05 21:07:57 GMT

1 watching live

OpenCV is the most widely used open-source vision library. It lets you detect faces in photographs or video feeds with very little code.

There are a few tutorials on the Internet explaining how to use an affine transform to rotate an image with OpenCV -- they don't at all handle the issue that rotating a rectangle inside its own bounds will generally cut off the corners, so the shape of the destination image needs to be changed. That's a bit sad, as doing it properly is very little code:

def rotate_about_center(src, angle, scale=1.):
    w = src.shape[1]
    h = src.shape[0]
    rangle = np.deg2rad(angle)  # angle in radians
    # now calculate new image width and height
    nw = (abs(np.sin(rangle)*h) + abs(np.cos(rangle)*w))*scale
    nh = (abs(np.cos(rangle)*h) + abs(np.sin(rangle)*w))*scale
    # ask OpenCV for the rotation matrix
    rot_mat = cv2.getRotationMatrix2D((nw*0.5, nh*0.5), angle, scale)
    # calculate the move from the old center to the new center combined
    # with the rotation
    rot_move =, np.array([(nw-w)*0.5, (nh-h)*0.5,0]))
    # the move only affects the translation, so update the translation
    # part of the transform
    rot_mat[0,2] += rot_move[0]
    rot_mat[1,2] += rot_move[1]
    return cv2.warpAffine(src, rot_mat, (int(math.ceil(nw)), int(math.ceil(nh))), flags=cv2.INTER_LANCZOS4)

The affine transformation of the rotation has to be combined with the affine transformation of translation, from the center of the original image to the center of the destination image. An affine transformation in 2D is a 2x2 matrix A and a translation 2x1 vector a - it takes a source point p = (x,y) to a destination: Ap + a. Combining two transforms Ap + a and Bp + b, doing A first then B, one gets B(Ap + a) + b - another affine transform with matrix BA and vector Ba + b.

In this case, we are combining a rotation with a translation; A translation as an affine transform has the identity 2x2 matrix I and a movement vector m, so is represented by Ip + m, and we want to first translate to the new center, then rotate about that, so we take the rotation Rp + r after applying Ip + m, which gives Rp + Rm + r, which explains why we have to only add two coefficients.

PS. Sadly, numpy interprets the multiplication operator * not as matrix multiplication if it considers the inputs to be vectors of vectors rather than matrices, so we have to explicitly write

PPS. We use the Lanczos interpolation which is generally good for scaling up but not for scaling down very small; that should be adapted given the application.

PPPS. The interaction with Python is much improved with the cv2 module, but there are inescapably some rough edges as numpy has a different co-ordinate ordering than OpenCV. Also, for some reason OpenCV persists in using units like degrees instead of radians, and so on. In numpy, the co-ordinates in an image array are accessed in [y,x] order, as in vertical increasing downwards first, followed by horizontal increasing rightwards second. In OpenCV, sizes are given as (width, height), the opposite order.

from skimage.transform import rotate

rotate(image, angle)

Posted 2014-04-06 01:56:10 GMT by Anonymous from

thanks that was very helpful and well explained

Posted 2014-04-14 11:56:38 GMT by Anonymous from

You are a very persuasive writer. I can see this in your article. You have a way of writing compelling information sparks much interest. Thank you!

Posted 2014-05-28 07:31:31 GMT by barbara

Thanks, it is good and informative and having good work. Once again thanks

Posted 2014-06-20 08:20:06 GMT by Anonymous from

A good blog always comes-up with new and exciting information and while reading I have feel that this blog is really have all those quality that qualify a blog to be a good one.I learn a lot from it. I wanna appreciate you for this great effort.

Posted 2014-06-27 01:59:26 GMT by Hopy2 from

Post a comment

Dynamic Lisp blog entry demo: rationalize

Posted 2014-03-13 08:01:18 GMT

Given a floating point number, how to go to its representation as a rational?

1.0471975 = 1/3π

Try another number:

Thanks to the RATIONALIZE function in Common Lisp. See the SBCL source.

Updated to not show d0 at the end

Posted 2014-03-13 08:03:49 GMT by John Fremlin


Posted 2014-03-19 07:01:40 GMT by Anonymous from

<script type=javascript>alert("what")</script>

Posted 2014-03-19 07:03:31 GMT by Anonymous from

Post a comment

Enterprise software wheel of life

Posted 2013-09-26 05:43:02 GMT

I've worked at big companies for a while and when planning a software project you need to figure out how to be a organisational team player and fit with all those other teams and their roadmaps. Here's a handy guide to how well another team's project will help yours:

Official state Apparent suitability for your project,
after meeting the other team
Actual state
Development 100% fit, can accommodate your capricious feature requests, designed to scale while consistently providing low latency, beautiful UX in the next quarter/half Non-existent, vaporware, not used for anything
Production Team tells you to go away until next quarter/half, will not discuss your use-case Bug-ridden mess failing at its first use-case
Deprecated Unmaintained, so no team to talk toYears of consistent operation for real use-cases, could do easily do yours if it weren't scheduled to be retired in the next quarter/half

The kicker being, of course, that the next quarter/half never seems to come around.

A ring of truth perhaps, and why is this?

I would say, the typical incentive structure primarily: proposing a project, you need to make the business case, and once that's locked in (the production stage), you don't want to compromise that by taking on something else — as the first case determines how you'll be evaluated. And once it's working, you will have many requests to fix the tough issues that have small wider benefit — but which are important to the users of the system harming your relationship with them — so it's time to create another project.

How to fix it? Do not emphasize project ownership (outcomes ownership instead), reward engineers who are willing to get their hands dirty across traditional team boundaries, and let them participate in evaluating the performance of the people who work on those other teams. In nearly every business there are teams with conflicting goals, and often directly conflicting, but it is possible to foster a culture of technical collaboration despite that.

A little thought at the beginning of a project in terms of its design can have a huge effect on the lives of everybody who has to work with it, and a little thought about the way people are included even more so. It's too natural for engineers, and engineering managers, to think there are no major feature requests for their system, simply because they have never spent time with the people who interact with the system every day.

Post a comment

Older entries (87 remaining)