John Fremlin's blog: The mmap pattern

Posted 2014-04-30 16:39:01 GMT

2 watching live

There are many choices in software engineering that are visible only to the developers on the project: for example, the separation of responsibilities into different parts of the program are (hopefully) invisible to the user. These choices can descend into a question of personal taste and I personally lean towards simplicity. My experience has shown that people tend to create complex generic interface hierarchies, only to have them hide a single implementation. What's the point in that? On the other hand, there are architectural decisions that affect the way the program behaves. For example, whether processing occurs on a server or mobile device ends up changing how the system can be used.

I want to bring up an underused architectural choice: the defined memory layout persisted to disk by the operating system via mmap. All malloc memory on modern UN*X systems is mmap'd. But explicitly choosing a file and mapping that into the memory space of the program means it can persist across crashes. And it doesn't just persist in terms of storage on a backing device, but the OS is aware the pages are mapped into memory, so on start-up there will be no disk read, no serialization delay and everything is ready. In many systems the effort of reading in data, whatever trivial transformation is being performed, can be extremely costly in terms of CPU time (instructions and dcache thrashing). With a mmap'd file, this time can be reduced to nothing.

One very key architectural decision for a system is the degree of reliability that it should possess. This is an explicit trade-off between the rapidity of development (in particular the level of indoctrination needed before new contributors are able to augment the feature set) and the operational stability. By preserving state explicitly to memory backed files, several classes of unexpected events causing the program to crash can be recovered from with minimal disruption. The benefit here is that the system can be developed rapidly with code reviews focusing on data integrity and paying less attention to complex interactions that can lead to crashes.

Modern CPUs have the ability to arbitrarily (generally at a 4kB granularity) map memory addresses as visible to a program to physical RAM addresses. The technique I am advocating here is a way of exploiting this hardware functionality in conjunction with operating system support via the mmap call (that turns a file into a range of memory addresses). It is quite possible to share mmap regions across processes so this gives a very high bandwidth unsynchronized interprocess communication channel. Additionally, the regions can be marked read-only (another nice capability afforded by CPUs) so data-corruption failure cases can be avoided entirely.

The main implementation difficulty with using a mmap'd region is that pointers into it must be relative to its base address. Suppose one were to try to persist a linked list into such a region. Each next pointer in the list is relative to the base address of the region. There are multiple approaches: store the base address in a separate (maybe global) variable, and add it each time, or try to mmap each region to a well-known start address (and fail if it cannot obtain that address). Generally with some trickery it is possible to exploit the memory remapping capabilities of the underlying CPU to reduce this overhead (i.e. store the base offset in a segment or other register). Each of these alternatives has advantages and disadvantages which can be debated; in practice, once the idea of persisting state to mmap files is introduced into an architecture, there are various reasons to try to use multiple regions (e.g. to support fixed-sized and non-fixed-size records, and to enable atomic exchange of multiple versions of the state).

Though there are low-level opportunities, this technique can actually be extremely beneficial in garbage-collected scripting languages where dealing with large amounts of data is generally inefficient. By mmap'ing a region and then accessing into it, the overhead of creating multitudes of small interlinked items can be reduced hugely. Large amounts of data can be processed without garbage collection delays. Additionally, the high cost of text-processing can be paid just once, when first building up the data-structure and later manipulations of it can proceed very rapidly, and interactive exploration becomes very convenient. The instant availability of data can reinvigorate machine learning work-flows where iteration speed from experiment to experiment is a constraining factor.

Despite the advantages, this technique is not widely exploited, which is why I'm writing about it. For Lisp there is manardb, and in industry there are several very large systems of the order of petabytes of RAM which use this idea heavily. Consider it for your next project!

Damn, this was trivial using PL/1 on Multics back in the mid 70s.

Posted 2014-05-01 05:56:25 GMT by Anonymous from 174.17.212.195

MSVC has support for __based pointers, maybe there is something like this in gcc / clang?

http://msdn.microsoft.com/en-us/library/57a97k4e.aspx

Posted 2014-05-01 07:11:48 GMT by Dimiter "malkia" Stanev

With modern 64/48 bit memory space, could the application choose *exactly* where to map the file, thereby avoiding the need for base pointers or other fixups.

Posted 2014-05-01 09:13:15 GMT by Chris Dew

"pointer swizzling" is one term for the fixup technique. The Apple Newton's OS is the only major user I can think of, what are some others?

Posted 2014-05-01 10:08:23 GMT by abrasive

Good read. I'm a big fan of using mmap(). One thing to note is that you need to be careful with your file formats so they are aligned properly.

A good way to work around this is to preprocess files for the specific architecture it's currently on. Key points to keep in mind are native type width, alignment.

Posted 2014-05-01 11:22:24 GMT by Steven

'''

There are multiple approaches: store the base address in a separate (maybe global) variable, and add it each time, or try to mmap each region to a well-known start address (and fail if it cannot obtain that address).

'''

ASLR is a conspiracy created by "security" professionals to make programs run slowly. Eschew it~! : )

Posted 2014-05-01 14:15:03 GMT by babycakes

Yes, a standard implementation of mmap can put the memory at a fixed address. It's got a lot of options; see the man page. This doesn't even require more than 32 bits of address space, just some coordination with your linker to ensure that the region you want to use for mmap'd pages is not otherwise used. In fact, this is essentially what the runtime linker/loader does when you load a dynamically linked library.

Posted 2014-05-01 14:41:49 GMT by Anonymous from 174.52.89.43

"Despite the advantages, this technique is not widely exploited"

Are you sure? Image-based systems such as Smalltalk and some Lisps have been around, if not very common, since the 70s. Your suggestion seems to be very closely related.

"By preserving state explicitly to memory backed files, several classes of unexpected events causing the program to crash can be recovered from with minimal disruption."

This might indeed be true for events such as abrupt power failure. However, if the crash is due to corrupted memory, you would not want to restart from such a damaged state. I don't see that there's an obvious way to differentiate these two cases.

Posted 2014-05-01 15:29:45 GMT by Michael Schürig

Post a comment