John Fremlin's blog

A single point of failure is ok

Posted 2016-10-05 03:11:30 GMT

Making big systems out of many computers, people often end up with lower reliability than with a single computer. Also amusingly they may be slower. There's a big temptation to avoid a single point of failure, by introducing multiple points of failure - one computer is actually quite unlikely to fail, but with many failures are common. If one assumes that the failures are uncorrelated, and there's some way to transparently switch over, then having multiple machines might make sense and it's an obvious goal. Who wants to admit that a single hard drive breaking took down a big website for a few hours?

Embarrassing though it would be, in attempting to make it impossible for a single machine to take things down, engineers actually build such complex systems that the bugs in them take things down far more than a single machine ever would. The chance of failure is increased with software complexity and likely to be correlated between machines. Distributed systems are much more complex by their nature so there is a correspondingly high software engineering cost to making them reliable. With many machines, there are many failures, and working round all the complicated correlated consequences of them can keep a big team happily in work and on-call.

A typical example of adding unreliability in the name of reliability, is the use of distributed consensus - often embodied by Zookeeper. Operationally, if the system is ever mis-configured or runs out of disk space the Zookeeper will stop working aggressively. It offers guarantees on avoiding inconsistency but not achieving uptime so perhaps this is the right choice. Unfortunately, the Paxos algorithm is vulnerable to never finding consensus when hosts are coming in and out of action, which makes sense given that consensus needs people to stick around. In human affairs we deputize a leader to take the lead in times where a quick decision is needed. Having a single old-school replicated SQL DB to provide consistency is not hip but typically would get more 9s of uptime and be more manageable in a crisis.

It can be hard to grasp when trying to deal with heavily virtualized environments where the connection between the services and the systems they run on is deliberately weak, but there's often actually one place where a single point of failure is fine: the device the person using to connect to the system. And in fact it's unavoidable. After all, if the phone you're using just crashes then you can't expect to keep using a remote service without reconnecting. Other failures are less acceptable.

By an end-to-end argument the retries and recovery should therefore be concentrated in the machines the people are operating directly, and any other reliability measures should be seen purely as for performance. Simplicity isn't easy for junior engineers, eager to make their names with a heavily acronymed agglomeration of frameworks and a many tiered architecture - but it leads to really great results.

Post a comment

Bad unit tests impart a false sense of security

Posted 2016-06-21 12:45:38 GMT

Testing improves software. So much so that lack of unit tests is called technical debt and blanket statements from celebrated engineers like Any programmer not writing unit tests for their code in 2007 should be considered a pariah are uncontroversial. When a defect is noticed in software it's easy to say it could have been found by better testing, and often it's simple to add a test that would catch it's recurrence. Done well tests can be very helpful. However, they can also be harmful: in particular when they cause people to be overly confident about their understanding of the consequences of a change.

A good test
— covers the code that runs in production
— tests behaviour that actually matters
— does not fail for spurious reasons or when code is refactored

For example, I made a change to the date parsing function in Wine, Here adding a unit test to record the externally defined behaviour is uncontroversial.

Tests do take time. The MS paper suggests that they add about 15-35% more development time. If correctness is not a priority (and it can be reasonable for it not to be) then adding automatic tests could be a bad use of resources: the chance of the project surviving might be low and depend only on a demo, so taking on technical debt is actually the right choice. More importantly, tests take time from other people: especially if some subjective and unimportant behaviour is enshrined in a test, then the poor people who come later to modify the code will suffer. This is especially true for engineers who aren't confident making sweeping refactorings, so that adding or removing a parameter from an internal function is turned into (for them) a tiresome project. The glib answer is not to accept contributions from these people, but that's really sad — it means rejecting people from diverse backgrounds with specialised skills (just not fluent coding) who would contribute meaningfully otherwise.

Unit tests in particular can enshrine a sort of circular thinking: a test is defined as the observed behaviour of a function, without thinking about whether that behaviour is the right behaviour. For example this change I made to Pandas involved more changing of test code than real code that people will use. This balance of effort causes less time to be spent on improving the behaviour.

In my experience, the worst effect of automatic tests is the shortcut they give to engineers — that a change is correct if the tests pass. Without tests, it's obvious that one must think hard about the correctness of a change and try to validate it: with tests, this validation step is easy to rationalise. In this way, bugs are shipped to production that would have been easy to catch by just running the software once in a setting closer to the production one.

It's hard to write a good test and so, so much easier to write a bad test that is tautologically correct, and avoids all behaviour relevant to production. These bad tests are easy to skip in code review as they're typically boring to read, but give a warm fuzzy feeling that things are being tested — when they're not. Rather than counting the coverage of tests as a metric, we could improve it by using test coverage of the real code that runs in production. Unfortunately, these are not the same thing. False confidence from irrelevant tests measurably reduces reliability.

Post a comment

Java: What a horrible virtual machine

Posted 2016-04-18 04:36:35 GMT

JVM bashing is an activity rarely supported by facts. I don't actually know the details. I mean Java I really don't care about. What a horrible language. What a horrible VM. So, I am like whatever, you are barking about all this crap, go away. I don't care. This quote from Linus Torvalds upsets people — there is a school of thought that participation medals should be handed to everybody in the race and one should never be nasty at all, so all criticism is wrong, and one should never listen to it. Given that Linus himself is an exceptional technical leader started multiple huge billion dollar industries (Linux and Git) this attitude is extraordinarily arrogant. Is there another person whose technical opinion on these subjects one should respect more?

In its defense, an argument is advanced about the JVM: that it must be good, just because so many resources have been dedicated to it. Unfortunately, software doesn't work like that. Even experienced big software companies that are accustomed to managing big projects can pour billions of development dollars into duds and this Wikipedia list of failed custom projects is salutary reading. There are other VMs, and while the JVM makes bold promises and did have a brief competitive period, it is now effectively a monoculture around the Oracle implementation (OpenJDK).

Lack of dynamic memory allocation. When starting the Sun (Oracle) or OpenJDK JVM people pass a Xmx flag saying how much memory it should use. This is crazy: decades ago in FORTRAN people had to predeclare the maximum size of their datastructures. Dynamic memory allocation with malloc was a big deal (FORTRAN 90 standardized dynamic allocations). And it definitely makes sense: a program should scale its memory usage according to amount of memory it needs. Declaring the overall space usage is indeed better than going through and annotating each array but it's incredible to me that we are still discussing this in 2016. The default value is 256MB, which is crazily low given how memory hungry Java programs typically are (a text editor probably uses more), and insane running on a server with 256GB of RAM. The trouble with raising it, is that then by default the JVM will not worry too much about freeing up unused heap memory if it isn't close to its limit. There is this hilarious question on Serverfault where a poor ex-JVM refugee is introduced to the concept of dynamic memory allocation (default in .NET is 60% of RAM which is so, so, so much more sensible). There are smaller VMs that have this issue (Common Lisp SBCL, for example), but other VMs that try to employ sensible heuristics (like Haskell GHC, which just has a suggested heap size). There are indeed pros and cons to different approaches and a mature VM should implement sensible heuristics by default and allow configuration. The JVM does not even attempt the former.

Lacking ABI and poor bytecode design. The virtual machine lacks basic features like generics, unsigned arithmetic, has poor support for dynamic languages, lacks value types (huge performance issue), and so on. Some of these issues are being addressed, e.g. the invokedynamic op, but it's telling that JavaScript V8 can beat the JVM on some microbenchmarks.

Poor foreign function interface. Even small VMs like SBCL Common Lisp, or Haskell have high performance and easy interfaces to C code. JNA makes a better interface. Sharing datastructures back and forth with native code is a big deal and the CLR from Microsoft invests a huge amount of effort into PInvoke. The JVM should too!

Lack of performance isolation. Ideally, a VM would let you run untrusted code. This is what JavaScript VMs in browsers do really well. The CLR has a concept of AppDomain, sort of a .NET container, which can be configured to limit memory usage and other things. Even PHP lets you limit memory usage per request. In a multithreaded JVM application, one bad request can OutOfMemoryException other requests on the same machine and there's no way to stop it. You can't even track the memory usage of a thread in a multithreaded program. Also, the JVM does not allow fork()ing so you can't use the OS isolation.

Another issue, unfortunately without supporting links, is that in my experience, JVM deployments get stuck on old versions. Every enterprise I've worked in that uses Java, has had some specific old version of the JVM (sometimes, incredibly specific like 1.5.0_05), that they were stuck on and could not upgrade out of, causing the usual problems with not being able to use new tools. Almost always the version used would be no longer supported and weird installers for it would be stored in odd places. Upgrades are always hard, but this is something that FreeBSD, Linux and even Microsoft Windows operating systems do better, and Intel does better with real, physical machines. Virtual machines were sold as more flexible and manageable than physical ones! In my limited experience with it, Microsoft CLR does a much better job here. This is exactly something that one would expect a mature VM with big development budgets to really care about.

It's great that the JVM ecosystem is improving. Lambdas and invokedynamic are good steps forward; but we need more! The concept of a virtual machine promised so much, it's now hard not to find the Oracle VM disappointing.

Post a comment

Android app shenanigans in 2016

Posted 2016-01-25 04:56:09 GMT

Discussing privacy and apps, my friend Jinyang told me about study he'd worked on called Who Knows What About Me? A Survey of Behind the Scenes Personal Data Sharing to Third Parties by Mobile Apps. This made me curious about what my own phone was doing. Fortunately, on Android you can gain administrator access to your device (root) through semi-supported mechanisms, and then use standard Linux sysadmin tools to figure out what's going on. The excellent SSHelper by Paul Lutus allows one to login conveniently via ssh. It was snowing here in NYC so I had plenty of time over the weekend to dig in.

First, I went through my Android Google Play Store app history and tried to install all the apps I'd ever used, total around 400. I ended up with only 181 installed apps in /data/app though, and 48 in /system/app, as the Play store crashed a few times.

Then I had a look at what services were actively listening for network connections (by running netstat -l -p -W). These programs are waiting for external parties to connect to the phone in some way, great in the case of the SSHelper program that I installed, because that's exactly what I wanted it for, but other programs are doing it without my consent and it's unclear for whose benefit.

Disabling information leak from Samsung SAP on port 8230. There was also a com.samsung.accessory.framework listening on port 8230. Turns out that this service is related to my Samsung watch, and if you connect to the port it'll give the model of my phone without authentication: XT1575;motorola;Moto X Pure;SWatch;SAP_... — given that the Samsung software running on the watch is written so sloppily that you sometimes have to reboot it to see the correct time, and the watch is set to connect via Bluetooth, I don't want to let anybody on the Internet have a go at vandalising my phone through this unnecessary service. Pretty easy to disable by running su iptables -A INPUT -p tcp --dport 8230 -m state --state NEW,ESTABLISHED -j DROP on the phone. This doesn't seem to affect the behaviour of the watch.

Local Facebook HTTP servers. There are two servers running on the phone from Facebook main app and Messenger, on ports 38551 and 38194 claiming to be GenericHttpServer. These are only accessible to apps on the phone. I won't comment more on these as I used to work at Facebook.

Local Android services. There are several processes like the Android debugging daemon running locally on port 5037, and the Low Memory Killer Daemon, and the Zygote app starting daemon and so on listening on UN*X sockets.

To see traffic lists, I ran grep [0-9] /proc/uid_stat/*/* after a reboot to dump the traffic usage. The uids can be linked to apps via /data/system/packages.xml, which I did via a quick Python script. There are some uids shared between packages. Oddly enough, my LIFX light app seemed to be all over the Internet. Snapchat was using the most data but I have fairly active account (@vii) that's open to non-friends so please message away. Another heavy app was S Health, especially annoying as I had turn off sync for it in settings. Also the id shared by com.google.android.gsf, com.google.android.gms, com.google.android.backuptransport, com.google.android.gsf.login was very active. Looking at netstat -p -W showed com.google.android.gms.persistent in regular contact with Google IPs (1e100.net). I set up traffic dumps from mitmproxy which showed polling of Google servers apparently about the location service and checking login status on https://android.clients.google.com/auth.

Stop apps running in the background unless they benefit you. The practice of many apps, even from fairly reputable companies, like the Amazon Shopping app, the Bloomberg app, the Etsy app, etc. to wake up and start using the Internet in the background is very damaging to battery life. These apps are communicating for their own interests, not mine, as far as I can see. The general pattern is to send up as much as can be gleaned about your phone as possible (for example, the Kindle app sends up tons of OpenGL information) — great for developers to understand their app install base. It's easy and convenient to crack down on them with the Greenify app, which unfortunately is an app and does its own tracking (quis custodiet ipsos custodes?). However, from the command line the dumpsys power command shows the apps busy in the background or holding wakelocks so you can do it by hand if you want.

The main contribution from the original paper that Jinyang co-authored was an analysis of the sorts of information that apps shared to their owners. It seems his methodology did not allow identifying which apps were responsible for the network traffic and indeed this is theoretically hard because an app can ask another app for something, but it's at least possible to figure out the app that made the network call. This can actually be done quite robustly and unintrusively with Android and iptables, by giving each app (uid) a separate IP address: use ifconfig wlan0:$uid $uid_ip to create an IP address for the uid, iptables POSTROUTING SNAT --to-source $uid_ip to mark traffic as coming from that IP. Unfortunately, this is was a little fiddly because I never mirrored the setup to IPv6 (just disabled IPv6 via /proc/sys/net/ipv6/conf/all/disable_ipv6).

Looking at a few games, they would eat a surprising amount of traffic. For an example, RopeFly used >50MB just starting up, asking androidads21.adcolony.com for assets, a plethora of tracking feedback links for measurementapi.com and then downloading a ton of video ad content from cloudfront, which it didn't show me.

My investigation was done over the snow weekend in New York, and there's obviously a lot more to dig into here: to watch more apps over a longer time with the one IP per app tracing, to use an mitmproxy like tool with support for SPDY and HTTP/2, and to disentangle some obvious shenanigans (for example, Foursquare was using some sort of obfuscation for its logs).

Despite having been involved in mobile app development for years, I was very surprised at how battery and data unfriendly popular apps are. The scheduled polling and dumping of device state might be convenient for managing the operational aspects of an app, but cost the install base battery life and mobile data — the tiny data caps even on unlimited lines in the US makes the second a real issue despite the low traffic cost to the people receiving the tracking data. After installing the apps, my phone heated up and my battery drained incredibly fast (almost as bad as the old days with an iPhone 5) but the battery tracking in the Android settings menu was very slow to assign blame to any culprit and hugely underestimated the overall impact they had.

Some ideas for our friends working on the Android platform (and of course, huge thanks to them for bringing Linux to our pockets):

— more aggressively attribute the battery cost for using mobile data connections and keeping connections open (seems to be accounted under non-app headings now);

— attribute the battery cost for apps that use wifi while not charging;

— all that's difficult: why not, by default, prevent apps from waking up in the background without the user's explicit consent? This should be a big permission with an easy toggle. There are a few apps that improve the user experience from this, like podcast downloaders (and that's great). Most apps don't. Until then, I guess we can install Greenify.

Let me know your tips, tricks and Android app advice! My phone is back to a reasonable temperature now — but what have I missed?

Post a comment

High Output Management by Andy Grove

Posted 2016-01-15 03:29:13 GMT

High Output Management by Andy Grove, CEO of Intel, was released with a new forward by Ben Horowitz, calling it a masterpiece. The concepts of objectives, key results, and one on ones are all very standard practice now. Intel has a long history of industry leading innovation (and of course cut-throat business tactics), and Grove is widely respected (though not as widely quoted as his predecessor Gordon Moore).

The book announces its premise that writing a compiler is a process, just like cooking an egg for breakfast — maybe true, in that compilers are well studied software products, with well defined inputs and outputs, but most software development is not really like this: if a thing has already been written and understood, then why replicate it? Writing a compiler is hugely expensive and one would much prefer to adapt an existing one.

Grove describes how to deal with the comfortable situation that a manager fully knows a process (like cooking a fixed breakfast) and just has to train up workers. In this benign environment, Grove suggests that the task specific competence of the employee be estimated, so that the manager can adjust the level of detail in delegating and monitoring tasks. One idiosyncratic demand is that the manager shield customers from the consequences of the employee's inexperience: i.e., learning from mistakes (that affect customers) is not accepted. This actually makes much more logical sense than the typical corporate schizophrenia of asking people to pretend to trust someone who is messing up a project and likely to fail to meet agreed goals, with the understanding that the only lever over this individual is probably to encourage them to leave (if only by not allowing them career progression) — unlikely to be the best choice for the business if the poor performer has proved value in another area. More importantly, it doesn't at all address a typical research situation in technology where nobody knows how to solve a problem and a manager can't just step in and show the right way.

On the other hand, the very paternalistic approach to management espoused has a warm human side, in that Grove emphasizes the importance of training and one-on-one meetings with subordinates. Under his leadership, Intel agreed to replace (very expensively) all Pentiums that suffered from a floating point bug, so unlikely as to be almost theoretical, and he paints the decision as one of corporate values, customer trust but also because employees were facing questions from friends and family about the issue: their personal identities were tied up in it.

Grove reiterates that it is future needs that should be focused on, rather than current deficiencies. Intel's business has very long and expensive planning cycles, as they innovate on transistor technologies requiring whole new manufacturing processes, followed by multiyear productionisation of chips even once their functional design has been fully finalised in great detail, so I was hoping that Grove might provide some insight into how to drive this uncertain process. It seems he is very pleased with his identification of the power of the Internet, and he talks at length about the need to identify trends, but in particular to follow the logical consequences of those trends to their conclusions and to anticipate the shift in power relationships that will occur. Then he urges vigilance in catching and not discounting the early warnings that something is changing — as CEO by making sure the concerns of lower ranking employees can filter up by holding townhalls, etc.

It was disappointing not to have a chapter about the Itanium (Itanic), a massively bold investment in a new sort of chip that had huge repercussions across the industry, and contributed to AMD beating Intel and achieving brief market leadership in desktop processors. Grove claimed to have trouble understanding the pros and cons of CISC versus RISC as well — oddly given his technical training. But having foresight for market secular trends is probably more valuable than predicting the outcome of technoreligious schisms.

All in all an exceptionally well written book, with a wonderfully clear purpose, followed up with a homework section to try to force its message into practice.

Post a comment

The bonds of SQL

Posted 2015-06-17 05:44:45 GMT

Question for a SQL test: write a query to return the top five sales for each day in the database? It's easy to express this query for a given day and many databases have extensions for writing this query - but it can't be expressed portably in standard SQL. And this query falls squarely into the core use-case that SQL is touted to solve.

The No-SQL key-value store movement exemplified by databases like MongoDB is often lambasted for ignoring the lessons of history. SQL, a venerable ANSI standard, represents that history and provides a well known language and a protocol to more or less decouple the application from the database implementation. People with diverse roles and backgrounds interact with SQL and are well-versed in its peculiarities: from analysts to database administrators to web front end developers.

Despite this, for another example, there is no simple query that can 'insert this value for a key or update that key if already present' atomically. Some SQL implementations provide extensions for this elementary and very common task (like MySQL's ON DUPLICATE KEY UPDATE), and it is possible with stored procedures at risk of losing performance to exception handling.

SQL is designed for 'relational' databases: that is, each row in a table expresses a relation and so must logically be unique. The adherence to this concept is why SQL cannot answer the simple sales query, and why No-SQL databases are justifiable not just on grounds of performance and scalability: they often fit the problem domain better. When a design requirement fails to fit the use case it should be re-evaluated: relational databases are very handy for some sense of purity but as systems like Hive demonstrate, things more or less work without pure relational semantics.

SQL imposes weird design constraints on a general purpose database: people add dummy 'id' columns to give each records a relational uniqueness. As with all failed designs it's true that relational databases have real advantages in many cases, but the choice to demand these strict semantics should lie with the user, and a standardisation of the syntax for avoiding them would mean that SQL could deliver on its promise of portability across database implementations.

We all benefit from a common language, and tying SQL to one database implementation dogma inspires the proliferation of No-SQL mini-languages, each with a learning curve and lacking features. It's time to wrest the familiar syntax from the constraints of an ultimately failed design and admit that non-relational No-SQL techniques have real benefits to deliver.

Post a comment

Judging innovative software

Posted 2015-06-15 22:00:00 GMT

There's an old aphorism: execution matters more than ideas. In software I think that's very wrong — I'll elaborate but the question here is how can you evaluate an idea for a piece of software before it's implemented and in production testing? It's definitely possible to a certain extent, and this is an important skill.

Firstly, let me define what I mean by an idea. I want to differentiate between ideas and desired outcomes. An inexpensive autonomous flying car or a wonderful app that can transcribe your thoughts are both exercises in wishful thinking. They're science fiction, indubitably of immense value if they could be created, but definitely there is no clear path to an implementation. An idea in software is a method of implementation, something like trace compilation or the Bitcoin blockchain.

A software idea rarely enables some new capability. Generally there is a way to replicate the software's function in some other way, for example, by paying people to do it manually, or by constructing specialised physical machines. A software idea is about changing the balance of resources needed to achieve a capability. For example, with trace compilation, you can get the benefits of explicitly typed machine code without having to do costly static analysis. A software idea is generally about performance, albeit potentially about a huge shift in performance characteristics (e.g. enabling large-scale de-centralised trustworthy but anonymised financial transactions).

Is it worth investing the development effort in a new software idea? When you come up with a new idea, people will inevitably attack it. As Ben Horowitz says, Big companies have plenty of great ideas, but they do not innovate because they need a whole hierarchy of people to agree that a new idea is good in order to pursue it. If one smart person figures out something wrong with an idea–often to show off or to consolidate power–that’s usually enough to kill it. How can you, the inventor, yourself decide if your idea is worth investing your time in further developing, when you and others can find issues with your new scheme?

There are classes of attacks on any new idea, that are essentially more about newness rather than the idea. For example:

— it's not been done before [there's an inexhaustible supply of inertia, entropy and lethargy in the world]

— it will be hard to manage operationally [only if you for some reason deliberately choose to not develop the necessary production monitoring tools]

— it will not work in production at a specific scale - without any actual issue being identified [quite insidious, because to counter it, you'd have to develop the project sufficiently that it could be put into production]

As most new software ideas are experimented with or thought about in people's free time, and then to bring them out of the whiteboard stage a huge amount of effort is needed, these attacks can stifle a project immediately. I believe they should be disregarded as much as feasible and instead the discussion should center on the idea itself, rather than on the issue of its novelty.

A very valid reason to dismiss a project is the existence of an alternative method with better performance. Quantitative estimates are essential here. [One way to strangle a project at birth is to require such detailed projections that it must already exist before its creation can be justified.] Beyond this first order inspection, Hints for Computer System Design by Butler Lampson illustrates a series of practical considerations.

The cost of development of a system does (and should) factor very much into the decision about whether to pursue it. This is unfortunately entirely dependent on the particular people who will create it. One trick is to force very short timelines for prototypes (hackathons, etc.) but that severely constrains the scope and there is a huge natural tendency for the offspring of prototypes to be coerced into production - casting doubt on the original implementor and the idea itself. Some people can give realistic estimates of development time and others cannot; take the best guess at the distribution of development resources that will be required to achieve a specific level of benefit.

Once you've thought and fought through the above, the actual implementation might be relatively straightforward. Note that generally a new idea uses a particular resource much more heavily than it was used previously. For example, a new image processing scheme might rely on CUDA GPU computations or the SSSE3 PSHUFB instruction, where before only the scalar CPU instruction set was used. This will inevitably cause unexpected interactions when deployed at scale by changing the system's characteristics (in this case for example by drawing more electrical power). The ability to handle these issues is a reflection of the degree of technical stagnation the wider system already faces (e.g. aging compilers, fixed JVM versions, etc.) and generally the necessary fixes will benefit even the old system. That sometimes makes the arguments about them easier to overcome.

Programming the actual implementation is relatively trivial once the broader picture has been set. The quality of the implementation should be easy to measure given the discussions around the quantification of the benefit of the new approach, and once measured things naturally improve - lighting the path is harder than following it, and ideas themselves definitely have a social value beyond their first implementation.

Post a comment

Grep orientated programming

Posted 2015-03-23 03:31:32 GMT

One key indicator of a software projects amenability to change is its greppability. Projects that are not greppable take longer to modify and discourage casual contributions - and casual contributions are valuable not only in the open source world but also in enterprise where the consequences of being hard to casually modify are exhibited in the emergence of competing solutions or meetings about trivialities.

What is greppability? Grep is the name for a program for searching text. Wiktionary defines greppable as a a format suitable for searching. I don't think this really captures the issue in terms of software where the source code is almost always in a simple text format - greppability is determined by its structure.

Greppability is the ease with which one can navigate a body of source code just by searching simple text keywords. From determining what code caused an output to tracing all the callers of a function, there's plenty that can be possible by text searching - or not depending how names are used or the project structured. And the easier it is, the faster and more reliably new developers can be productive.

For example, a Microsoft style error message might be Action Failed Error Number: 2950 - this is incredibly ungreppable. In a large codebase: the words Action Failed or Error Number are likely to occur very frequently all over and even the number 2950 is likely to appear often. Therefore even a very skilled developer with full access to the source code, will on encountering this error will have a great deal of difficulty in determining the place where it was generated. A highly greppable alternative for this would be to include a distinctive keyword like access_macro_vba_fn_error in the message - this hopefully will appear only in places in the source code that are relevant.

Naming is obviously a key issue. Don't take advantage of separate namespaces to call different things by the same string - if a function is called getName then searching for it is likely to throw up hundreds of unrelated hits to a quick text search. But if it were called something more specific getWidgetName then instantly it's easier to figure out what is calling it and consequently the ramifications of changing its behaviour - reducing the incidence of unpredictable bugs.

Constant indirection is the enemy of greppability. It might be that the MS codebase (that I haven't seen) has something like const int kErrorAccessVBAMacroFun = 2950 in it. Once you've found 2950 is tied to this constant, then you have to grep again for the name of this constant to find out where it is used - making the process tiresomely more convoluted.

As a Lisper, it's sad to admit but dynamic code generation, introspection and macros can be the enemy of greppability. For example, there could be a DEFINE_ERROR(AccessVBAMacroFun, AccessErrorBase + 50) or something that would mean anybody grepping for kErrorAccessVBAMacroFun would have a hard time tying it to 2950. There might be an awesome error database tool but how will someone new know about it? While it might seem like good separation of concerns and neat code, it's not greppable. The fact that a new developer, however skilled, can't easily figure out which software caused the effects if he or she came at it from its external output is bad in itself.

Take a minute to think about greppablity - with a clever code generation or dynamic database scheme, try to have some keyword or string from the generated output appear in the source code, maybe just in comments.

Making code more greppable can have no cost, but opens up another tool to people on the project, and a tool specially easy for unsophisticated people or simple automation to use. Grep for the win!

This is actually why I think it's a good idea for error messages to contain a short identifier that can be searched for. This helps not just when searching the source code for that message, but also, for example, when searching the web for others encountering the same error, possibly with a differently worded message due to internationalization and localization. As a bonus, this gives you a convenient identifier for adding internationalization and localization to your software with something like gettext. This also extends to warning messages and other types of messages, but it's especially useful for error messages, as those are the most common kind of message for which one would want to find out what caused the message to appear and what can be done to prevent that condition from occurring.

Posted 2015-03-23 17:27:26 GMT by inglorion

my thoughts exactly!!!

Thanks for putting it out there...

Posted 2015-03-25 10:05:54 GMT by Anonymous from 148.87.67.201

Fremlin, instead of writing this on your blog, an appropriate response would entail three words and one exclamation point.

Posted 2015-04-10 10:07:59 GMT by Anonymous from 24.160.38.124

Post a comment

The Right Price for the Price is Right: Optimal Bidding

Posted 2015-01-25 03:11:29 GMT

The famous TV game show, The Price is Right had an excellent one-bid game where four players would take turns to give a guess for the price of an item (no guesses can be repeated). Any guess higher than the item price (an overbid) was discarded and then the closest remaining lower guess would determine the winner. Sometimes there was an exact prize bonus too.

This game has been studied extensively. When I first heard the rules, I wrongly intuited that the overbid condition would make it easier for the first player: quite the opposite.

The first simplification to make thinking about the problem easier is to reduce it to assume that all participants share the same model of the distribution of the price. Provided they are not sure about the price, then this resolves to guessing the closest value under a fair many sided die or roulette wheel spin.

Suppose we first consider the case where if all contestants overbid, then nobody wins the prize and the game is not repeated. The strategy of the last player is very easy: just pick the one more than a previous bid, or pick the lowest possible bid, whichever one has the highest probability weight between it and the next highest bid or infinity if it would be the highest bid.

Inducing backwards, the second from last player must consider that the last player will take almost all the probability mass from his bid, by bidding just above it, if that bid has the largest probability region above it. Therefore he or she must choose a bid that leaves another more tempting region for the last player to take. This implies that each preceding player must take at most their fair share - that is the first player must bid at min{b : P(X ≥ b) ≤ 1/n}, or for four players on a roulette wheel from 1-36, he or she would bet just exactly than 1/4 from the end, or 28 (winning on the 9 numbers, 28-36).

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36

Suppose he or she bid anything less - even just one less, 27. Then he or she would have 10 winning numbers, but the next player might be tempted to take the range 28-36 which has 9 winning numbers and leaves the big region 1-26 to be safely split between the third and fourth.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36

Otherwise bidding below, the lowest the second player could safely go is 18, and also only get nine numbers. As there is no advantage to this, he or she might as well go for 28.

We therefore assume the first player picks 28, so the second player will take 19-27 (nine numbers), leaving the third player to take 10-18, nine numbers.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36

Consequently everybody will get an equal share. This analysis differs extremely from The Price Is Right, But Are the Bids? An Investigation of Rational Decision Theory by Berk, Hughson, and Vandervonde. In their version they fail to consider the discrete nature of the problem and assume that in cutting off another player (guessing just above another player) one receives the entire probability mass of that player. However, of course, one cannot repeat the exact guess so there is a little that previous player gets to keep.

As another difference, they consider a game where in the case where everybody overbids, the bidding is restarted (with an implicit maximum of the previous lowest bid, and in the same player order). In this version, the last player has an incentive to not bid 1, as the game will then restart. Then for all possible prices outcomes below the lowest bid, the last player will receive a probability mass Plastwin that he or she would win after a restart. This gives a probability mass bonus of PlastwinP(min(bi) > X) to the interval obtained by cutting off another player. Assuming as this paper does, that the space of bids is continuous and not discrete this means that the last player would cut off the best previous player's interval (so in this continuous bid version where each bid has measure zero, taking the entire probability mass) getting at least 1/3 of the mass if not overbid, and otherwise gambling on a restart. This implies that Plastwin is 1/3.

In practice, contestants are very nervous about overbidding and in 54% of cases the winning last bet is to cut off the highest bid.

Post a comment

ToqPiq: XKCD comics on the Qualcomm Toq smartwatch

Posted 2014-11-02 23:43:56 GMT

During the YC Hacks Hackathon, I made an app for the Qualcomm Toq SmartWatch that shows XKCD comics on the watch. It's pretty rough and ready.

Sad that the API from Qualcomm does not include more functionality (doesn't even allow arbitrary aspect ratios for the photos shown on the watch!). I've thrown the hack on GitHub, have fun!

Post a comment

Older entries (96 remaining)