John Fremlin's blog2021-03-03T00:00:00ZBuilding software in the cloud2021-03-03T00:00:00Z/build-in-the-cloud<div class='blog-entry-story'><p>Compute is elastic now so running 1 job for 1000 minutes costs as much as 1000 jobs for 1 minute. The primary resource to conserve in the software life-cycle is developer time: something non-linear to do with developer attention - if you make someone wait five minutes they'll go off and do something else. Ideally, you want to speed up the observe-orient-decide-act-loop with quick feedback.</p><p>The quickest feedback comes from compilation and small fast tests (not getting into religious wars about what qualifies a test to be a unit test). However, this feedback is very incomplete and the faster you can get to a real test (e.g. canary) the better. <a href="https://instagram-engineering.com/continuous-deployment-at-instagram-1e18548f01d1">Instagram boasts about rapid deploy cycles</a>.</p><p>How to reconcile this with the complex dependency graph of modern software deployment? It's just not possible to force everybody into the same repository or build system. A change in one library might affect a big graph of downstream projects. Companies that try to run monorepos often have software straggling into github.</p><p>There is a broadening radius of passing tests and releases as code is pulled into more and more deployments. Rather than fight this reality, a build system should acknowledge it. Branches should melt into each other when tests pass and releases are signed off. Builds happen rapidly, cache their inputs and it's immediate and fast to start images for a given build. </p></div>Run anything on NixOS with Docker passthrough2021-02-15T07:33:41Z/docker-passthrough<div class='blog-entry-story'><p>NixOS is very opinionated. Suppose you want to run the latest Android Studio or a Python project with weird pip installs. It will work on Ubuntu and take work on NixOS.</p><p>Here is a Dockerbuild that makes an image where you share your home directory, and can use apps installed in Ubuntu.</p><p><pre> FROM ubuntu:devel ENV DEBIAN_FRONTEND=noninteractive RUN perl -pi -e 's/# deb-src/deb-src/' /etc/apt/sources.list RUN dpkg --add-architecture i386 \ &amp;&amp; apt-get update \ &amp;&amp; apt-get install -qy \ cmake g++ git \ build-essential \ tar python tzdata sudo bash-completion \ gdb openssh-server rsync dpkg-dev clangd default-jdk \ libc6:i386 libncurses5:i386 libstdc++6:i386 lib32z1 libbz2-1.0:i386 \ &amp;&amp; apt-get build-dep -qy plasma-desktop clangd libc6:i386 default-jdk \ &amp;&amp; apt-get dist-upgrade -qy \ &amp;&amp; apt-get clean -qy RUN echo '%sudo ALL=(ALL) NOPASSWD:ALL' &gt;&gt; /etc/sudoers &nbsp; ARG VII_USER_NAME ARG VII_USER_ID &nbsp; RUN useradd --shell /bin/bash -u ${VII_USER_ID} -o -c "dockerpassthrough" -m ${VII_USER_NAME} RUN adduser ${VII_USER_NAME} sudo &nbsp; USER ${VII_USER_NAME} WORKDIR /home/${VII_USER_NAME} </pre></p><p>To build: <pre> docker build --build-arg VII_USER_ID=$(id -u) --build-arg VII_USER_NAME=$USER . -t vii-ubuntu-passthrough </pre></p><p>To run a shell <pre> docker run --mount type=bind,source=$HOME,target=$HOME --mount type=bind,source=/tmp,target=/tmp --net=host -e DISPLAY=$DISPLAY -it vii-ubuntu-passthrough:latest bash </pre> </p></div>Documentation namespaces2020-10-10T00:00:00Z/documentation-namespaces<div class='blog-entry-story'><p>Python packages have multiple names: the pip install name like scikit-learn then a name they are imported by, sklearn. Not to pick on Python, it's the same for other systems: in Java there is the jarfile name or Maven name like compile group: 'commons-io', name: 'commons-io' and then the import name like org.apace.commons.io. In C++ there is also no restriction on the namespaces a library might export.</p><p>When designing software it is super easy to come up with namespaces for nearly the same thing; for example, Docker logically has a distinction between <a href="https://docs.docker.com/glossary/">images and containers</a>. This <a href="https://stackoverflow.com/questions/23735149/what-is-the-difference-between-a-docker-image-and-a-container">makes sense but it's confusing</a>. It means tools to manage these objects need to be created for each kind of object: there's a docker rm command to remove a container and a rmi command to remove an image.</p><p>One problem is introducing new concepts and kinds of things. There is a cost to that, borne by everybody who comes in contact with the system, and have to become familiar with new kinds of things and the relationships with them.</p><p>A second problem is that depending on context a name from one namespace may be preferred over another namespace but the context may not be clear between the person conveying the information and the person receiving. When metrics are emitted are they indexed under container or image? Sometimes it is clear which should be chosen and sometimes not.</p><p>Third, people have to painfully discover the relationship between names in different namespaces which may not be documented. To talk about a library in a build file you need to use one name and to import it another. Golang does a good job here by having a simple transformation between the names. In general, however, transformations like camelCase to snake_case where one or the other is used depending on context can be just as confusing as having multiple namespaces. </p></div>USB replug script in bash2020-09-09T00:00:00Z/bash-replug-usb<div class='blog-entry-story'><p>USB devices especially webcams frequently get wedged, and stop responding to the host computer. On Linux, you can reset them with software directing the unplug or replug event by calling bind and unbind.</p><p>This script maps from the USB port path, which can be complex, to the USB vendor id and product id. Unfortunately, lsusb does not report the port path.</p><p>Without a device to reset, it prints something like this <pre> 1-10 -> 8087:0029 unknown_product 1-8 -> 06cb:00c4 unknown_product 1-9 -> 30c9:0002 HP HD Camera 9-1 -> feed:1307 Ergodox EZ 9-2 -> 046d:085e Logitech BRIO </pre> And with a device, <pre> $ sudo bash usbreplug.sh 046d:085e replugging 9-2 -> 046d:085e Logitech BRIO </pre></p><p>For a given /dev/video device you can find the product and vendor IDs from the udevadm info -x -q all --name /dev/video0 output - assuming it is connected by USB.</p><p>And now - the script <pre> #! /bin/bash set -o errexit -o pipefail -o nounset shopt -s extglob targetVendorProduct="${1:-}" sleepDelay=0.5 cd /sys/bus/usb/devices/ for portPath in +([0-9])-+([0-9\.]); do function portInfo() { cat $portPath/$1 2&gt;/dev/null || echo "unknown_$1" } idProduct=$(portInfo idProduct) idVendor=$(portInfo idVendor) product="$(portInfo product)" if [ -z "$targetVendorProduct" ]; then echo "$portPath -&gt; $idVendor:$idProduct $product" elif [ "$targetVendorProduct" == "$idVendor:$idProduct" ]; then echo "replugging $portPath -&gt; $idVendor:$idProduct $product" echo $portPath &gt; /sys/bus/usb/drivers/usb/unbind || echo "unbind $portPath failed" &gt;&amp;2 sleep ${sleepDelay} echo $portPath &gt; /sys/bus/usb/drivers/usb/bind fi done </pre> </p></div>Exactly-Once in multithreaded async Python2020-08-01T00:00:00Z/python-async-thread-exactly-once<div class='blog-entry-story'><p>Python's builtin greenlets with asyncio are complex to use and the relationship with other threading mechanisms are unclear. <a href="https://nullprogram.com/blog/2020/07/30/">Here's an example of an attempt</a> to create only once semantics.</p><p>With multiple OS threads in the same Python process, each with its own asyncio event loop, then at most one thread and therefore at most one event loop may execute Python code at any time under CPython. If you have IO heavy code or code that can run outside Python, the convenience of a single consistent Python environment may outweigh the costs of having to serialize all Python execution.</p><p>One nuance is that under asyncio all futures are bound to one event loop. You can't wait for another event loop from another, or else you get an error like <pre> RuntimeError: Task ... got Future ... attached to a different loop </pre></p><p>Having loops wait for each other is complex to debug. However, it is possible to achieve quite easily by having the future you're waiting for call back to the loop you're in via call_soon_threadsafe. This primitive allows calling across loops safely.</p><p>The future publishes the loop it is on. If it's not on the current loop, create a future on the current loop that waits for the original one. We could create at most one future per loop by caching these.</p><p>Here's a demo implementation. <pre> import asyncio import threading from contextlib import contextmanager &nbsp; async def one_time_setup(): with p('one_time_setup'): await asyncio.sleep(1) &nbsp; future = None async def demo_exactly_once(): global future if not future: future = asyncio.create_task(one_time_setup()) with p('waiting'): local_loop = asyncio.get_event_loop() if future.done() or future.get_loop() == local_loop: return await future else: local_future = local_loop.create_future() future.add_done_callback( lambda result: local_loop.call_soon_threadsafe( lambda: local_future.set_result(result)) ) return await local_future &nbsp; @contextmanager def p(msg): """Simple reporting mechanism including loop and thread information""" def event(event): print(f"{msg} {event} from thread: {threading.get_ident()}, event loop: {id(asyncio.get_running_loop())}") event('start') try: yield finally: event('end') &nbsp; def worker(): new_loop = asyncio.new_event_loop() asyncio.set_event_loop(new_loop) new_loop.run_until_complete(demo_exactly_once()) for thread in [threading.Thread(target=worker) for _ in range(2)]: thread.start() </pre> </p></div>Roles in tech2020-04-11T00:00:00Z/roles-in-tech<div class='blog-entry-story'><p>The Google and post-Google generation of Silicon Valley technology companies force a conflicting position on skills. We want software engineers to be able to think about product choices and product managers to be able to design systems. But then we prohibit some roles from pushing code and stop others making product decisions.</p><p>A product designer who can write code is more productive than one who can't, other things being equal, as understanding the system end to end and being able to make small tweaks without asking for help is valuable. However, we generally decide it's not a requirement. Ideally, everybody could do everything.</p><p>Of course without maturity, knowing how to do someone else's job leads to snide remarks about how easy it is and overly prescriptive advice &mdash; removing agency and the feeling of ownership. In our imagined ideal world, everybody could do everything but also trust and respect each other.</p><p>Suppose we were in this ideal world. Would there be specialisation of roles? One day a product specialist negotiates a timeline for migrating off a backend system. The next day a data scientist draws out the screens for a new application flow. They could do this in our imaginary situation where everybody has all the skills, but specialisation is still useful for two key reasons: <a href="https://www.nytimes.com/2019/07/14/opinion/multitasking-brain.html">individuals are bad at multitasking</a>, and if everybody is doing everything everybody has to be in the loop. That imposes inefficient communication overhead.</p><p>It also sets up difficult incentives. A backend engineer would love the business to be simple and in fact should advocate for simplicity as the cost of complexity is born most heavily by this role. A product manager must achieve a compromise and so should not too strongly weigh towards one avenue.</p><p>In deciding how to split work there are three considerations to trade off.</p><p><h3>Context switch cost</h3> People are proud of their ability to multitask and feel pleasure in thinking they can hold the overall picture as well as master details. However, studies consistently show that this proficiency is overestimated.</p><p>Different responsibilities have naturally different quality and timeliness tradeoffs, feedback loops, activities and tools. In my experience the highest context switch disturbance comes from changing from an uncontrolled task where the feedback cycle is long (e.g. a pitch deck to powerful people) to one where it is short and there is limited control like debugging why an error is thrown unexpectedly from a backend system.</p><p>The context switch is also paid by everybody who interacts with the person multitasking, and is especially confusing if the multitasker is trying to play coordinating roles.</p><p><h3>Cost of involving multiple people</h3> Correcting a spelling mistake on an unimportant untranslated webpage is my example for a simple task that should achievable by anybody on the team. Inventing process where a task like this flows through different disciplines and is prioritised by many different people is a waste of time.</p><p>A typical structural problem is that few people are familiar with the analytics systems. To answer straightforward questions like, <q>how many customers bought products last week</q> multiple roles have to be involved. This is very damaging as the high cost means people end up with wrong pictures of how things are going.</p><p><h3>Sustainable allocation of work across team</h3> Typically more effective people are paid more than their coworkers but not as much as the difference in effectiveness so it's objectively cheaper for the firm to give them all the tasks. But their time should be prioritised to concentrate on more impactful tasks and other people need the opportunity to learn.</p><p>Given that and if we take as an example that we're working on a typical consumer facing web virtual goods storefront project, what are the roles and expectations for them? Fundamentally, it is necessary to decide on the purchase and refund flows, etc., pricing of items on the store, and then reliably implement that.</p><p>We therefore need product designers to think through the user needs and lay out a broad view of how someone should experience the storefront. Frontend software engineers are responsible for making the detailed choices and implementing a system that works well for customers using the most popular devices. Backend software engineers are responsible for reliably ensuring that datastores powering the customer accounts, shopping cart and purchase records are reliably run, easy to audit, and easy to use for the frontend engineers. Prices are set by the data-scientists.</p><p>These roles are absolutely essential and cannot be dispensed with. This does not mean that other roles are not valuable or critical to the longterm success of a business. It's just that one could still make a website without a payroll department.</p><p> They also are hard to combine into a single person.</p><p> <h3>Product designers</h3> lay out how users will interact with the software produced and fit in into their activities. The product designer might use storyboards or Photoshop and need to wait for months for objective feedback, that very often focuses on implementation details out of their control. Therefore design critique and subjective feedback is very important.</p><p>The closest essential role is frontend engineer but the work is very different in terms of immediate feedback and tools. The frontend engineer might be experimenting with ways to make scrolling smooth.</p><p>To further separate the role, some companies employ UX researchers to orchestrate customer research. This usefully separates a very different set of tools (running surveys and studies).</p><p>The output of a designer is a story of how the software should interact with customers. Bad designers miss major consequences, produce impractical incoherent workflows and confuse customers by misprioritising functionality. Great designers lay out a happy relationship between customer and software, simplify or remove workflows altogether, and surface functionality that is so obvious to customers that it feels like it is natural and there is no design.</p><p><h3>Software engineers</h3> design, write and operate the software. There are many specialisations, from Software Reliability or Production engineer to Machine Learning Engineer. It's not realistic to switch someone between these specialisations and expect the same level of output but there is a common cycle of implementation, testing and measurement.</p><p>Frontend engineers, working on user interfaces measure marketing and interaction funnels, latencies and reliability. Backend engineers measure uptime and latency for their critical workloads. Machine learning engineers measure precision and recall. The work is detail orientated, can get objective feedback quickly and has deep domain specificity. </p><p>The output of a software engineer is sustainable improvements to how the software meets its business goals and measurement of that. Bad engineers increase complexity, aim at the wrong level of automation, or make changes that simply do not improve goals, e.g. by crashing the system, and pick uncoordinated tools that will cause problems going forward. Great engineers measure obsessively and innovate to reduce complexity and pick tools that obviate entire classes of work going forward.</p><p><h3>Data scientists</h3> measure complex effects and estimate the size of future opportunities. While software engineers run experiments and measure simple interactions like a dollar increase in sales, data scientists measure nuanced effects like customer retention.</p><p>It is often hard for engineering managers to adapt to managing data scientists. In contrast to software engineers, where projects may last for months or years, data science explorations typically last days or weeks. Data-quality fixes are easy to describe but hard to discover, and most issues are out of the control of the data scientist who must beg for logging fixes and improvements to data query and ETL pipeline systems.</p><p>The output of a data scientist is improved decision making. Though the decision making process may involve other functions, the quantified measurements and opportunity sizing from data-scientists guides what should be worked on. Bad data scientists run experiments that depend on broken data sources to produce misleading, inaccurate statistics and miss major effects. Great data scientists identify the most valuable data sources, obsessively debug and cross-validate to get accurate statistics, and run experiments that indisputably identify long term opportunities and risks.</p><p><h3>Product managers</h3> bring all the functions together, build agreements on the compromises needed to ensure good business outcomes are delivered. They set up swimlanes and parameters, so there is trust that each group, with its necessarily different priorities, will not cause trouble for others.</p><p>Knowledge workers work better when they are bought into and understand the rationale behind their efforts. So everybody needs to have a say in the goal setting and prioritisation of projects.</p><p>Product managers are responsible for finding and then articulating a compromise between all the competing interests and pressures. To be effective at this and trusted they must <em>avoid insisting on an agenda themselves</em> &mdash; inspiring end goals are good, but, crucially, pushing a vision that people disagree with is the opposite of compromising.</p><p>The person responsible for driving these compromises might hold different roles depending on the nature of the project. Business facing projects typically have a product manager in this role. However, most tech companies do not have product managers in infrastructure teams because high technical skill is needed to understand the compromises, and people will not trust a generalist, so this role is played by tech leads, TPMs and engineering managers. The product manager has to be enough of a domain expert to be trusted by everybody, and therefore must be able to write queries and code and clearly articulate back an understanding of their partners' difficulties.</p><p>The output of a product manager is a working agreement between the other functions and shared expectations around a common plan. Bad product managers try to bury disagreements, fail to find compromises, and seed distrust so everybody wants involvement in every decision of every other, and each group articulates different goals for the broader effort. Good product managers broker compromises, let everybody feel heard, and foster trust so that decisions can be devolved to the right group of experts who can excel in these swimlanes, and everybody articulates the same goals for the broader effort.</p><p><h3>Line managers</h3> are responsible for balancing the needs of projects against the needs of the people available to work on them and advocating for scope changes and people allocations as appropriate.</p><p>They need to balance career growth, the business need for basic tasks to be completed, and likely hiring needs.</p><p>To do this and by trusted by their reports, they must be able to understand not only the immediate tasks but also have ability to look ahead, as moving people or hiring or firing take many months.</p><p>The output of a manager is the quality of employees doing useful work. Bad managers lead to people doing irrelevant tasks and failing to address important ones, potentially keeping people superficially happy but losing better employees and retaining the worst. Great managers inspire people to achieve business results, set the right size of team for the future, quickly manage people into places they will be productive and happy, and retain and attract better employees.</p><p><h3>Tech leads</h3> ensure the software components work together to deliver business value. They evaluate the technical quality of contributions and contributors, helping managers understand who is performing well, who is stuck, and where there are gaps in planning that need to be addressed. The tech lead identifies and communicates a prioritisation of problems and opportunities and the plan to address them.</p><p>Tech leads need to be trusted by the engineers they are working with, and when implementing engineers are stuck the tech lead is often expected to jump in and unblock the problem.</p><p>The output of a tech lead is the quality of the whole system they are responsible for. Bad tech leads do not balance between investment and quick results and skew to extremes either late or very buggy, and ship uncoordinated components that contain unnecessary duplication (e.g. multiple login systems), and cannot explain clearly how major business problems will be solved. Great tech leads allow innovation while shipping reliably, help all teams use similar vocabularies and build trust and encourage re-use of the right components, and clearly communicate how problems should be addressed and why priorities were chosen.</p><p><h3>Technical program managers (TPMs)</h3> make life unsurprising. They communicate with all stakeholders, surface issues early and share timelines. Bad TPMs continually cycle through new spreadsheets and JIRA projects, are unaware of the shadow structure of decision making in the organisation, allow people to escape accountability and ignore red flags. Great TPMs constantly update one clearly communicated system for long periods, dig deep at the first sign of trouble, know exactly what's going on, and leave have everybody feeling like they own their timelines but somehow they all fit together.</p><p> There are many other roles, like product specialist. And in the end, people fit themselves into different shoes touching on different aspects. </p></div>Testing systems at scale2020-04-05T00:00:00Z/testing-systems<div class='blog-entry-story'><p>Test driven development is trendy. Interview loops and promotion ladders insist on testing. We as an industry have agreed that software engineers need to do their own testing, rather than throwing incomplete code over the wall to a QA department. People love to throw out idealistic simplifications and insist on investing in one kind of testing and ignoring others. This is wrong.</p><p>There is no easy answer. 100% code coverage is <a href="https://docs.microsoft.com/en-us/azure/devops/learn/devops-at-microsoft/100-code-coverage-worth-cost">expensive and doesn't guarantee correctness</a>. In fact, testing is a trade-off between effort and the quality of the outcome. Different kinds of bugs have very different impact, from business ending to trivial, and effort should be dedicated appropriately.</p><p>That said, there are good answers. Medical and aviation software is very reliable. Beacons of quality and reliability exist in every industry. It is possible to make systems with few bugs &mdash; the question is how expensive it will be. And the irony is that software without bugs often ends up being much cheaper than software with bugs.</p><p>How should you spend testing effort to reduce bugs? Once this scales to a group of people, Just as with any other human endeavour, clarity in definition of goals and measurement is important. One simple measurement in the absence of application specific goals is the rate at which serious issues are discovered.</p><p>Another measurement that is important is of the effort spent adjusting test environments or modifying old tests when adding new features. Poorly set up, overly specific tests and convoluted test frameworks lead to technical debt and slow down new development. They spend the testing effort budget uselessly. Signs of this are when engineers complain about other people's tests. Even imperfect tests are better than none and should be encouraged.</p><p>A bad measurement is the amount of time spent testing. It's hard to disentangle this from development.</p><p>An ideal test would give confidence there were no important bugs, and run in development all the way to production, where it joins monitoring and alerting systems. In many environments some level of unreliability is an expected part of the contract, so tests should not fail for this. I've worked on many systems with engineers complaining about flaky tests when the tests fail at the same rate as production. If it's not a problem in production, the tests should be ok to fail at that rate.</p><p>End to end, integration and unit tests can be useful, especially when there is a clear definition of proper behaviour at their corresponding levels. On the other hand, enshrining the behaviour as written can enshrine the wrong behaviour, <a href="bad-unit-tests">give false confidence and make the software unnecessarily hard to modify</a>.</p><p>Each context needs its own testing. For user interfaces, tests that produce screenshots or videos are very useful. For systems, SOAK tests that try to stress out peak concurrent load can reproduce difficult bugs and shadow testing where new code is given production input but has its output ignored, can iron out networking and deployment issues. For machine learning systems, automated tests of cross-validation quality assert that the output is meeting its primary objective.</p><p>To offer a heuristic as a hint: my experience is that a test environment using historical snapshots of production data can often strike a good balance between discovering issues early and cost to maintain. Dummy test environments take a long time to set up and enshrine wrong assumptions. Historical data is often a useful kind of messy.</p><p>The right testing tradeoff depends on the application. If the output is a machine learning decision, then the quality of that decision should be assessed quantitatively - unit testing all the functions involved doesn't tell you if the system is working well. Datasets, both input and output should be tested maybe more than the code. Each application is different and needs its testing budget spent wisely! </p></div>Shipping fast and right2020-01-19T00:00:00Z/shipping-fast<div class='blog-entry-story'><p>Software projects often get stuck in billion dollar black holes of no results. Five years ago my boss's boss asked me how we shipped our big project on time. Since then we've shipped a lot more and I feel I've got better at it and shipped projects worth hundreds of millions at least. So how?</p><p>Partly it's working with incredible people willing to go the distance and get their hands dirty. But a lot is process.</p><p>Firstly, there's no point shipping if you're not shipping something good. So clarify what that means specifically for your project, identify the risks to that, work to estimate and control those, repeat and iterate.</p><p>There is an considerable jargon around project management: controlling scope, integrating feedback, identifying stakeholders, setting up a charter. These concepts are immensely useful for communicating about and addressing the typical dysfunctions that hold projects back.</p><p>However, much is papering over a failure to follow a simple cycle of impact driven development. I think largely because it is really hard as human beings to stay open to feedback, not get defensive and instead proactively keep looking for problems.</p><p>1. Identify risks. As a group gets larger it is harder and harder to do this. It's natural human behaviour to attack and ostracize people who bring up problems &mdash; especially if they are missing context. Instead, they need to be supported. In software projects, even the most junior engineer is an expert on one component and should be treated with respect. If context is missing, that is a failure of the organisation.</p><p>Risks can be anything: that there is no plan for a needed system, that a system has a crippling bug, that a team is missing necessary expertise, that a competitor is going to launch a better product, that a director is squabbling with a VP &mdash; anything that can impair achievement of the goals.</p><p>Seek out and hear concerns &mdash; wherever anybody might have a valuable perspective; this does not mean they all have to be dealt with.</p><p>2. Estimate and prioritize attacking the greatest risks. <a href="https://www.youtube.com/watch?v=m2sj-U2QSHs&feature=youtu.be">Get rid of luck</a>. The biggest risks are the most inconvenient to deal with: tough conversations with the CEO about legal issues or potential public humiliation from putting a poor product on the market. Address these risks upfront.</p><p>Effort must be stopped on easy and comforting work that doesn't address a risk. As risks are managed, it is easier and easier to get people to help; that's when the comfortable work can start.</p><p>Seeing the most important risks get acknowledged and addressed is incredible for morale. It's actually really positive, builds trust, and makes people feel their work is worthwhile.</p><p>3. Repeat and iterate this risk management process continually. Larger groups are more and more vulnerable to group-think. Assumptions must be tested &mdash; with tests as realistic as possible to mirror the real world conditions after things have shipped.</p><p>The complexity and scope of tests should scale with the work: a financial computation needs unit tests against audited outputs, a system needs integration and end-to-end tests, a bigger system needs tests with realistic loads, a product needs market tests. The tests need to be targeted to help estimate the key risks identified.</p><p>These tests will bring up new risks that need to be addressed. Having a planning cycle that does not allocate time to these issues, or treats them as failures rather than opportunities means risks will be hidden.</p><p>Spending throw-away development effort on addressing risk is not throw-away work. It must be recognised appropriately. And testing must happen at rapid frequency so systems are built with the expectation that they are live.</p><p> This cycle of measurement and integration of feedback is really hard to follow. Addressing uncertainty is much harder than trying to define it out of scope. It feels chaotic. An ordered plan that never changes and never integrates learning is much more comfortable. But it doesn't make sense unless you are absolutely sure you know what you're doing &mdash; admitting you don't is hard but a <a href="https://en.wikipedia.org/wiki/Death_march_%28project_management%29">death march</a> is harder. </p></div>Toys for project management2019-08-03T00:00:00Z/toys-for-project-management<div class='blog-entry-story'><p>Project management is overhead but communication between people is essential. Just as team outings or testing are overhead but can accelerate the main activities of a group, project management done well is a massive productivity boost.</p><p>Unfortunately, while I've worked on projects with many good TPMs, there is always criticism of the project management. The complaints are always the same: everything should be in <q>one place</q> and somehow <q>simple</q>. These demands are contradictions. They are inevitably hurtful to anybody who put effort in to project manage a complex project.</p><p>The solutions proposed are always to start from scratch with a new methodology: JIRA, spreadsheet, Trello, etc. This is destructive to morale in two ways: you're telling the people who put effort into the old systems that they're not recognised, and second, demanding they toil to move everything into the new shiny toy.</p><p>Even more destructive to morale is when the new toy system inevitably captures only part of the complex project or gets unwieldy - or out of date. And then the complaint that there are multiple systems and not just one place are made true.</p><p>People only loosely involved in the project must not be allowed to destabilise it with these criticisms and any success in project management needs to be celebrated loudly. It's a tough job and the hard work of bringing up mismatched expectations, cross checking and following up is what's useful - and much more important than choosing a shiny toy. </p></div>Curse of business logic2019-05-19T00:00:00Z/curse-of-business-logic<div class='blog-entry-story'><p>People complain that a piece of software is bogged down in technical debt and needs to be rewritten as there is business logic scattered all over it. If the rewrite goes ahead, this is then inevitably followed &mdash; in a few years &mdash; by a new set of people making the same complaint about the rewrite.</p><p>What is wrong with business software? If we follow this reasoning, it becomes messy due to contamination by filthy business logic.</p><p>That premise is convenient. We can use it to avoid thinking about the business. Unfortunately, it rejects the source of funding for the software. The reason the software was written in the first place was hopefully related to supporting the business.</p><p>Logically, then if there is anything in the business software that is not <em>business logic</em>, why is it there? </p></div>