John Fremlin's blog: Grep orientated programming

Posted 2015-03-23 03:31:32 GMT

One key indicator of a software projects amenability to change is its greppability. Projects that are not greppable take longer to modify and discourage casual contributions - and casual contributions are valuable not only in the open source world but also in enterprise where the consequences of being hard to casually modify are exhibited in the emergence of competing solutions or meetings about trivialities.

What is greppability? Grep is the name for a program for searching text. Wiktionary defines greppable as a a format suitable for searching. I don't think this really captures the issue in terms of software where the source code is almost always in a simple text format - greppability is determined by its structure.

Greppability is the ease with which one can navigate a body of source code just by searching simple text keywords. From determining what code caused an output to tracing all the callers of a function, there's plenty that can be possible by text searching - or not depending how names are used or the project structured. And the easier it is, the faster and more reliably new developers can be productive.

For example, a Microsoft style error message might be Action Failed Error Number: 2950 - this is incredibly ungreppable. In a large codebase: the words Action Failed or Error Number are likely to occur very frequently all over and even the number 2950 is likely to appear often. Therefore even a very skilled developer with full access to the source code, will on encountering this error will have a great deal of difficulty in determining the place where it was generated. A highly greppable alternative for this would be to include a distinctive keyword like access_macro_vba_fn_error in the message - this hopefully will appear only in places in the source code that are relevant.

Naming is obviously a key issue. Don't take advantage of separate namespaces to call different things by the same string - if a function is called getName then searching for it is likely to throw up hundreds of unrelated hits to a quick text search. But if it were called something more specific getWidgetName then instantly it's easier to figure out what is calling it and consequently the ramifications of changing its behaviour - reducing the incidence of unpredictable bugs.

Constant indirection is the enemy of greppability. It might be that the MS codebase (that I haven't seen) has something like const int kErrorAccessVBAMacroFun = 2950 in it. Once you've found 2950 is tied to this constant, then you have to grep again for the name of this constant to find out where it is used - making the process tiresomely more convoluted.

As a Lisper, it's sad to admit but dynamic code generation, introspection and macros can be the enemy of greppability. For example, there could be a DEFINE_ERROR(AccessVBAMacroFun, AccessErrorBase + 50) or something that would mean anybody grepping for kErrorAccessVBAMacroFun would have a hard time tying it to 2950. There might be an awesome error database tool but how will someone new know about it? While it might seem like good separation of concerns and neat code, it's not greppable. The fact that a new developer, however skilled, can't easily figure out which software caused the effects if he or she came at it from its external output is bad in itself.

Take a minute to think about greppablity - with a clever code generation or dynamic database scheme, try to have some keyword or string from the generated output appear in the source code, maybe just in comments.

Making code more greppable can have no cost, but opens up another tool to people on the project, and a tool specially easy for unsophisticated people or simple automation to use. Grep for the win!

This is actually why I think it's a good idea for error messages to contain a short identifier that can be searched for. This helps not just when searching the source code for that message, but also, for example, when searching the web for others encountering the same error, possibly with a differently worded message due to internationalization and localization. As a bonus, this gives you a convenient identifier for adding internationalization and localization to your software with something like gettext. This also extends to warning messages and other types of messages, but it's especially useful for error messages, as those are the most common kind of message for which one would want to find out what caused the message to appear and what can be done to prevent that condition from occurring.

Posted 2015-03-23 17:27:26 GMT by inglorion

my thoughts exactly!!!

Thanks for putting it out there...

Posted 2015-03-25 10:05:54 GMT by Anonymous from 148.87.67.201

Fremlin, instead of writing this on your blog, an appropriate response would entail three words and one exclamation point.

Posted 2015-04-10 10:07:59 GMT by Anonymous from 24.160.38.124

>As a Lisper, it's sad to admit but dynamic code generation, introspection and macros can be the enemy of greppability.

Similarly a C++ template, parameterised by a type T, that assumed T had a member function called getName would also be negative for greppability.

Posted 2017-01-09 13:41:09 GMT by Paul Delhanty

Post a comment