John Fremlin's blog

LISA17 conference notes

Posted 2017-11-01 23:00:00 GMT

I attended the LISA17 conference in San Francisco. Here are some very rough and opinionated notes from talks that interested me. If I made any mistakes let me know and please comment on the things I missed!

Ted Ts'o: Linux performance tuning

This tutorial was exceptional because the speaker has years of experience as one of the top Linux developers. Ted uses emacs!

Goals of performance tuning

  • improve throughput
  • manage and understand performance cliff

Investigate bottlenecks with rigorous scientific method: change one parameter at a time. Take lots of notes and make haste slowly. Measurement tools can impact behaviour of the observed application.

Basic tools to start with

  • Memory: free -m. Adding memory might be the easiest way to speed up the machine.
  • Tasks: top. What are the CPUs doing or waiting for?
  • IO: iostat -x -k 5. This shows average queue size (avgqu-sz) for requests and merging statistics - crucial for understanding performance given larger requests are much more efficient.

Example filesystem benchmark: fs_mark -s 10240 -n 1000 -d /mnt - creates 1000 files each 10kB in /mnt and does fsync after each. Can be greatly improved by relaxing sync guarantees. Use the exact benchmark for your application!

Cache flush commands are ridiculously expensive. Google runs in production with journal disabled, as it is so much faster and there is a higher level consistency guaranteed by multi-machine replication. This cross-machine cluster file-system also means RAID is unnecessary.

Ted Ts'o made this snap script for low impact performance monitoring extraction from production systems.


In terms of selecting hardware: note seek time is complicated and should typically be reported statistically - worst case and average. Low number offsets LBA at the outer diameter of the disk can be much faster to seek. Therefore you can put your most used partitions at the first offsets of the disk to get a lot more performance - this is called short-stroking and can be a cheap way to get more performance. Filesystem software moves slowly as it has to be reliable and hardware generally moves much faster.

HDDs at 15000rpm run at hot temperatures and use a lot of power; many applications that used those have moved to SSDs. SSDs can also use a lot of power. They tend to fail on write. Random writes can be very slow - 0.5s average, 2s worst case for 4k random writes. You can wear out an SSD by writing a lot. See Disks for Data Centers in terms of advice about selecting hardware (Ted wrote this). Think about how to use iops across the fleet (hot and cold storage). The interface SATA 1.5Gbps or 3Gbps or PCIe may not be important given that e.g. random writes are slow. RAID does not make sense generally in today's world (at Google scale) and can suffer from read/modify/write cycles.

We can think about application specific file-systems, now we have containers. For example, ReiserFS is good for small files, XFS good for big RAID arrays and large files. Ext4 is not optimized for RAID.

Consider increasing journal size for small writes. Note Google disables the journal altogether.

Recommends Brendan Gregg's perf tools using ftrace. These were introduced at LISA 14

  • iosnoop - friendly blktrace
  • bitesize - distribution of IO sizes
  • opensnoop - file opens
  • iolatency
Also more advanced versions with lower overhead due to computing aggregates in kernel using eBPF, the BPF Compiler Collection (BCC): biosnoop, bitesize, biolatency, opensnoop.

Consider the multiqueue scheduler for very fast storage devices like NVMe.

Network tuning

Immediately check for trivial basic health: ethtool, ifconfig, ping, nttcp. Check for various off-load functions and that the advanced capabilities of the card are used.

Consider whether you want latency or throughput. Optimize the bandwidth delay product. Then remember that increasing window size takes memory; this can be tuned with net.core.rmem_max and net.core.wmem_max. Use nttcp to reduce buffer sizes as much as possible to avoid bufferbloat.

UDP might be a better bet.

However, we can push TCP to be low latency. Disable Nagle with setsockopt TCP_NODELAY. Enable TCP_QUICKACK to disable delayed acknowledgments.

NFS performance tuning

Recommends considering an NFS appliance, e.g. NetApp.

Some tricks: use no_subtree_check. Bump up nfs threads to 128. Try to separate network and disk IO to different PCI buses - no longer necessarily relevant with PCIe. Make sure to use NFSv3 and increase rsize/wsize. Mount options: rw,intr. Make sure to tune for throughput, large mtu and jumbo frames.

NFSv4: only use NFSv4.1 on Linux 4+, see ;login magazine, June 2015.

Memory tuning

If there is any swapping, first, try adding more memory. Add more and faster swap devices.

Paging will happen. majflts/s is rate of faults that result in IO. pgsteal/s is rate of recycling of page cache pages.

Try sar -W 3 and periodically send sysreq-m.

Note the memory hierarchy is important as closer caches are much faster.

Translation Lookaside Buffer (TLB) caches translation from virtual address to physical address. Can avoid up to six layers of lookup on 64-bit system - costing thousands of cycles. There are only 32 or 64 entries in the TLB in a modern system.

The jiffies rate can greatly affect TLB thrashing by controlling rate of task switches. Hugepages avoid consuming these entries. Kernel modules burn TLB entries while the originally loaded kernel does not.

The perf tool can show TLB and cache statistics.

Application tuning

Experimentation with eBPF.

For JVM consider GC and JIT. Size the heap.

Tools: strace, ltrace, valgrind, gprof, oprofile, perf (like truss, ktruss). Purify might not be as good as valgrind.

perf is the new hotness. Minimal overhead, should be safe in production from a performance perspective. However, the advanced introspection capabilities may be undesirable for security.

There are many interesting extensions - like the ext4slower program which shows all operations on ext4 that take longer than 10ms.

Userspace locking: make sure to use futex(2).

Consider CPU affinity.

Latency numbers that all programmers should know. Note this does not include random write for an SSD because that depends on a great many factors.


It's more addictive than pistachios!

It's time to shoot the engineer and put the darn thing into production.

Great way of learning about the whole stack!

Robert Ballance: Automating System Data Analysis Using R

This talk presented a valuable philosophy and attitude: that we should consider making repeatable re-usable reports. This goes against the grain of expectations around reporting which often frame reports as one-off tasks. The examples were very compelling.

Some background: R was written at Bell Labs by statisticians who were very familiar with UN*X. Data is dirty. The computations and software for data analysis should be trustworthy: they should do what they claim, and be seen to do so.

I've spent my entire career getting rid of spreadsheets.

Very rapid growth in CRAN R packages. Pipe operator %>%.

Used dplyr. Small repeatable pipelines for reports that can be reused. Very pretty code examples using dplyr and ggplot and the aforementioned pipe operators.

Renee Lung: Testing Before You Scale & Making Friends While You Do It

Your customers shouldn’t find problems before you do.

Onboarding a big new account with an expected 20k incidents per day, around 7M per year.

They wanted to test the load. The only thing that behaves like prod, is prod.

Chaos Engineering is about experiments in realistic conditions. PagerDuty has Failure Friday - where they expose systems to controlled experiments.

Balance business and engineering.

Decided to create a tool to simulate load.

Noticeable customer impact from first and second test but they still persisted which was quite brave. The talk was very honest about the interpersonal and organisational issues that the project faced.

Tried to explain why the staging environment is different from production to an idealistic questioner.

Baron Schwartz: Scalability Is Quantifiable: The Universal Scalability Law

Eben Freeman's talk on queuing is really good!

Recommends a talk by Rasmussen on failure boundaries.

Failure boundary is nonlinear.

Hard to apply queuing theory to the real world of capacity and ops, as difficult to figure out how much time is spent queuing in real systems.

Add a crosstalk (coherence) penalty with a coefficient k as a quadratic term to the denominator in Amdahl's law. The penalty represents the cost of communication.

Defines load as concurrency.

Suggests that load-tests should try to fit the crosstalk penalty and Amdahl's law parameters. Claims that this fits quite well to many real world scaling problems with some abstract examples.

Chastity Blackwell: The 7 Deadly Sins of Documentation

Without effort, documentation will be scattered across multiple systems and notes that the costs are paid in ramping up new people. We should invest in documentation.

Blake Bisset; Jonah Horowitz: Persistent SRE Antipatterns: Pitfalls on the Road to Creating a Successful SRE Program Like Netflix and Google

SRE is not a rebranded ops, should not try to build an NOC.

Sasha Goldshtein: Fast and Safe Production Monitoring of JVM Applications with BPF Magic

Beyond performance, we can trace things like system calls to find out where something is happening - for example, the stacktrace of the code that is printing a message.

The JVM can cooperate by adding more tracepoints -XX:+ExtendedDTraceProbes.

The advantage of BPF as opposed to perf, is that BPF can filter and aggregate events in the kernel, which can make things much faster than perf, which just transmits events. BPF can calculate histograms and so on.

Needs recent kernels - 4.9 kernel for the perf_events attaching.

DB performance tracing

Many performance investigations can occur now without modifying any programs. For example, there are instrumentation scripts like dbslower that can print out which queries in MySQL and can be extended to other databases.

We can trace and find out the exact stacktrace where a query is printed.

GC performance tracing

Can trace GC: ustat tool and object creation with uobjnew.

Trace open file failures

Use opensnoop to find failed open syscalls. Then attach a trace for that specific error to a Java application.

Michael Jennings: Charliecloud: Unprivileged Containers for User-Defined Software Stacks in HPC

Want to make it possible for people to bring their own software stack to run on the supercomputers at Los Alamos, and decided to explore containers. Unlike virtual machines, they do not have performance impact on storage or networking (Infiniband).

Recommends this LWN article: Namespaces in operation.

Docker with OverylayFS can be slow on HPC. Therefore they built a system called Charliecloud with minimal performance impact, and native file-system performance.

Matt Cutts and Raquel Romano: Stories from the Trenches of Government Technology

Sometimes don't have source code or access to logs.

Many basic problems: 5% of veterans incorrectly denied healthcare benefits from a single simple bug.

Great value delivered by bug bounty programs.

API first!

Meaningful contribution - hugely impactful, bipartisan problems. Looking for software engineers and site reliability engineers for short tours of duty.

Jake Pittis: Resiliency Testing with Toxiproxy

CEO at Shopify once turned off a server as a manual chaos monkey.

Continued to work on resiliency, with gamedays. Then thought about automating the game days to ensure that issues remain fixed and don't regress.

Want to maintain authenticity.

ToxiProxy interposes latency, blackholing, and rejecting connections in the production systems and then is supported by automated testing in Ruby that asserts behaviour about the system.

Incident post-mortem fixes are checked and verified by injecting the faults again and checking application specific consequences. This confirms that fixes worked, and continue to work in the future.

Resiliency Matrix declares expected dependency between runtime systems. ToxiProxy tests allow one to validate that the dependency matrix truely reflects the production reality.

Brendan Gregg: Linux Container Performance Analysis

Common system performance tools like perf do not work well in containers, as the pid and filesystem namespaces are different. System wide statistics (e.g. for free memory) are published to containers which causes programs to make wrong assumptions: for example, Java does not understand how much memory is actually available in the container.

The situation is improving and there is ongoing integration of support for monitoring performance of containerized applications.

Understanding which limit is hit in the case of CPU containerization can be very confusing as there are many different limits.

PS. Brendan's talk from last year at LISA16 gives a general introduction to the advanced bpf tools: LISA16: Linux 4.X Tracing Tools: Using BPF Superpowers

Teddy Reed and Mitchell Grenier: osquery—Windows, macOS, Linux Monitoring and Intrusion Detection

Labs showing how to collect and query many system level properties like running processes from a distributed set of systems with a tool called osquery.

It can collect current state and also buffer logs of changes.

Heather Osborn: Vax to K8s: Ticketmaster's Transformation to Cloud Native Devops

Tech Maturity model.

20k on-prem VMs.

Kevin Barron: Coherent Communications—What We Can Learn from Theoretical Physics

Human communications take a lot of time and we need to be careful that we're really communicating.

Evan Gilman and Doug Barth: Clarifying Zero Trust: The Model, the Philosophy, the Ethos

Establish some strong properties: that all flows are authenticated and encrypted.

No trust in the network. Automation based policy based on a Ruby DSL and Chef that reconfigures iptables rules to add IPSec routes between application tiers.

Related to Google's BeyondCorp.

Beyond the control aspects, another value of the approach is observability. Mentioned that another way of doing this is Lyft Envoy.

Mostly build your own still.

Brian Pitts: Capacity and Stability Patterns

Very thoughtful talk with a comprehensive coverage of various techniques.

EventBrite has 150M ticket sales per year. Very spiky traffic. Over one minute can quadruple.

Bulkheads: partition systems to prevent cascading failures.

Canary testing: gradual rollout of new applications.

Graceful degradation.

Rate limiting. Understand capacity and control amount of work you accept.

Timeouts. Even have to timeout payment processors.


Capacity planning.

Corey Quinn: "Don't You Know Who I Am?!" The Danger of Celebrity in Tech

High energy and well presented talk.

Netflix: developers have root in production.

Should not be cargo-culted to places without same culture of trust and top quality talent.

Be careful about punching down. Recognise the weight that your words carry coming from a successful company with specific constraints.

Culture of security conferences is toxic.

Ben Hartshorne: Sample Your Traffic but Keep the Good Stuff!

Adapt sample rate as you're watching your traffic, to scale observability infrastructure logarithmically with production. Sample rate should be recorded in event, and reduced in proportion to traffic.

Honeycomb does this with honeytail. Another alternative is Uber's opentracing: Jaeger which uses a consistent sampler.

Mohit Suley: Debugging @ Scale

Distributed tracing is the new debugger.

Use Twitter's anomaly detection R library.

Jon Kuroda: System Crash, Plane Crash: Lessons from Commercial Aviation and Other Engineering Fields

Need to better at following checklists, sterile cockpit rule (kicking out unqualified people). Avoid normalization of deviance. Lots to learn from airline industry!

Think about telemetry.

Post a comment

Square CTF 2017 Grace Hopper

Posted 2017-10-18 22:00:00 GMT

Square put on a great competition this year at the Grace Hopper conference. My girlfriend was attending and had solved a lot of the challenges but some of the high pointers were left.


The 6yte challenge hooked me. The task was to exploit an x86 32-bit Linux binary to print a file to the console - in only 6-bytes of machine code. Most opcodes on x86 are at least two bytes, so this means using at most three instructions. A tight budget!

The 6yte program was pretty simple. It memory mapped an executable region then decoded its command line argument as a hex string into this region, then jumped to it. It also printed out the addresses the program was loaded in.

On one side, this is much easier than exploiting a typical application nowadays, which probably marks writable memory non-executable. On the other hand, the 6yte program calls alarm() so that if you pause in the debugger, it will just crash, and it also uses sleep() to delay, so you can't immediately just try tons of things at random. These countermeasures made the contest much more fun for me.

I spent quite a while being misled by the printing of the program addresses into thinking I should use that. I wanted to call the puts function that is used elsewhere in the program to print out the string. In the process I learnt a lot about the Procedure Load Table. Trying to compute the relative address and then generate the right relative jump confused me. My plan was to spend one byte pushing edi onto the stack, and then five bytes jumping to puts(), or try to push the address of puts() onto the stack then jump to it, or something along those lines, but I just couldn't squeeze it into the remaining five bytes. Time to look more closely at the problem!

The disassembly for the jump to the decoded shellcode first loaded the target string into eax and then put it in edi

0x80488d2:   8d 85 68 ff ff ff               	lea eax, dword [ ebp +0xffffff68 ]
0x80488d8:   89 c7                           	mov edi, eax

Then we were given some handy register values. Comparing to Linux system call numbers, 4 is very helpful because it means write, and STDOUT_FILENO is 1. We are all set up to make a syscall!

0x80488da:   ba 05 00 00 00                  	mov edx, 0x5
0x80488df:   bb 01 00 00 00                  	mov ebx, 0x1
0x80488e4:   b8 04 00 00 00                  	mov eax, 0x4
0x80488e9:   ff 65 e8                        	jmp dword [ ebp + 0xffffffe8 ]

To execute the system call we need just two bytes cd80, but first we need to mov edi to ecx (with 89f9). This will unfortunately only print the first 5 bytes, as defined in edx, but we have two bytes of instructions left. I tried several ideas for increasing edx, like adding it to itself it and shifting it and so on, but then remembered the wonderful lea instruction on x86. This oddly named instruction doesn't actually access memory. It combines a mov, and a shift add - a limited multiply accumulate.

To find out opcodes, I was grepping objdump --source /lib32/ that has a good sample of opcodes. A shortcut to avoid having to run an assembler. I discovered that ecx <- edi + xx is 8d4f05xx. This costs four bytes, and then the last two can be used to do the int 0x80 syscall. Not the neatest solution (Cong Wang has a much better one) but it let me read out the flag :)

By now I was pretty enthused with the competition - it was my first time crafting an exploit!

Needle in the Haystack

The next problem I tried was the Needle in the Haystack forensics challenge. Many years ago I implemented a vfat filesystem and fsck program, so I was very happy to see that it had a FAT12 filesystem. Inside was a Ruby on Rails project. There were some developer passwords in it which I immediately put into the submission box for the challenge - and was punished with a RickRoll. That teasing riled me and though it was late on Thursday night after my floundering around on the previous problem, my motivation redoubled.

It took me a while to realise that there was nothing in the Ruby on Rails project. I compared it against the base skeleton created by the Rails setup by default. This was my first CTF, I didn't know the conventions. I wasn't sure how much work was expected and what to focus on.

I tried a bunch of data recovery programs to dig out deleted files from the filesystem image, and found a tarball containing a git repository. I checked out all revisions in all branches, some of which were very temptingly labeled, but there didn't seem to be anything in it, so I tried a bunch more undeletion programs, and then switched to solve smaller challenges.

This gave me more idea of the conventions used in the competition. Reading the rubric for the haystack challenge I noticed it mentioned the word blob: that meant the git repo was the right track: and git fsck --lost-found gave up the answer.

This ended up being my favourite problem because it combined so many technologies (Ruby on Rails, FAT, git) and tantalized with apparent victory at multiple steps.


Other questions were also fun. I really enjoyed the SQL challenge - my first time doing a SQL injection, for which the SQLite function GROUP_CONCAT was very helpful. Then I found out that there was an easier version of 6yte without the bytes limit. Dammit!!

Now it was super late at night but team chocolates was #2 on the leaderboard. The challenges had all been so much fun and the VMs and setup for each were super well done, so I was very enthusiastic and feeling very hungry for the #1 spot. The next morning I was very excited as I thought the competition would end on that Friday (Alok eventually explained to me that we had an extra week) and I ran round to round up people to help. I didn't want to lose and was wondering about getting IDA Pro to solve the floppy challenge. Turns out Zach and Trammell were the right choices and chocolates got to #1 while I slept.


The contest was a ton of fun. It was my first time attempting shellcode and first time doing SQL injection. It makes me really appreciate the value of having a library of tools in security research. The field has developed its own culture, experts and can seem impenetrable (pun!). This CTF made it accessible to me in a way I never anticipated.

I learnt new techniques: about instruction coding, dynamic linking, and about process from Trammell. He showed me how to connect qemu in debug mode and Hopper. The way he approached the problem was very different to my unskilled initial expectations: I thought of trying to extract the logic into a sandbox and would have been tripped up on all the countermeasures in the code, whereas Trammell's confident holistic reverse engineering quickly neutralised them.

In terms of the mechanics, the contest framework, with VMs being spun up, was ambitious and worked perfectly. On a non-technical level, the jokes and teasing with RickRolls and countermeasures made the challenges personal. Solving the problems was joyous. It left me very impressed with the culture at Square, that visibly invested so much. The contest was run really well with great responsiveness on Slack, and I'd love to do more CTFs. Thanks Square!

Post a comment

Zero value abstractions

Posted 2017-10-06 22:00:00 GMT

The Rust and C++ communities have embraced the idea of abstractions without runtime overhead. Object orientated programming encourages the idea of dynamic dispatch - at run-time choosing what to do based on the type. This is costly: a small cost as a decision has to be made at runtime and a potentially very expensive consequence: the compiler can't inline and propagate constants. However, it allows code to be written once which works with many types. So called zero cost abstractions avoid this by having the compiler figure out the specific concrete implementation behind the abstraction and then perform its optimizations with this information.

Runtime cost is actually only part of the cost of an abstraction. Even if there is no runtime cost, abstractions must provide value as they have other costs. An abstraction introduces a new concept and imposes a mental burden on the people working with it. In the ideal case, the abstraction is perfectly aligned with the problem domain: for example, it's often very convenient to be able to show something on the screen and get its dimensions independently of whether it is a photo or a video — abstracting over the difference reduces the amount of code written, and makes it clear that the code doesn't care about those details. This may actually be good for people debugging and reading the code.

Abstractions defined in the wrong way can make it hard to modify code: by muddling together unrelated things, by hiding useful details, increasing compile times, or just by confusing people and taking up mental space. These costs are less easy to measure than the runtime cost. However, they can be much more expensive. Debugging code from a stagnant project, where the build environment isn't readily available, is vastly harder when there are layers of abstraction. Abstractions obscure the answer to the question: what does this code actually do?

Weak engineers can try to abstract away the parts of the project they don't know how to accomplish. No value is being added there. Another abuse is in wrapping existing code and interfaces belonging to another project or group: this sort of wrapping layer is very easy to write and gives an illusion of productivity, but means that the people who own the code will now have to understand a wrapper in order to help.

It's fun to reduce runtime costs. However, given the other costs are normally more significant, it's important to think of the value that the abstraction brings. An abstraction needs to be valuable even if there are no runtime costs. How much does it really help?

The worst abstractions abstract nothing, and provide no value: most commonly, a grand interface with a single implementation. They impose a cost on all readers — slogging through meaningless code, and slow people debugging production issues, who eventually have to understand that the interface is a mask for the real underlying implementation. Abstractions are costly. When reviewing or writing code, remember abstractions must provide value.

Post a comment

Pagination, a great software interview question

Posted 2017-05-05 22:00:00 GMT

Experienced, highly productive, senior engineers at companies like Google have told me they take weeks to prepare and practice for interviews as they don't use those skills in their jobs. Brainteasers are a complete waste of time but it's hard to find questions which are relevant to the job without being overly specific.

However, pagination is a great question: from ranked search to simply retrieving a list from a database, cross-machine APIs are very inefficient if they try to gather and send everything at once or just one thing at a time; instead they should fetch items in batches (an obvious but impractical extension of this idea is the lambda architecture). Pagination is great as an interview question because it is clearly related to the job, trips up software engineers regularly, and can be asked in calibrated stages. A huge amount of software engineering involves cross machine APIs nowadays, so the question is not specific to one area.

1. Screener question: API for fetching in pages of a given size from a fixed list from a single server. If the candidate can't do this then he or she will almost certainly struggle to implement any sort of API. Pivot to ask another simple question (like: intersect two lists of personal contact information), to assess whether the candidate was just confused by the framing.

2. Main question: API for fetching pages in a given size for a list from a single server where items can be deleted or inserted while the fetch is happening. It's helpful to give some business context: an extreme case is a list of messages in discussion forum app that supports deletes. This forces the solution away from the obvious idea of keeping a consistent snapshot of items for each client on the server. The client has state from previous pages that it can send to the server.

Once the candidate can solve the problem even if things are deleted or inserted at inconvenient times, to assess the quality of the solution: ask how much information needs to be communicated each fetch? Trivially, the client could send back everything it knows but that destroys the benefit of batching. Ideally, the client would send back a fixed size cursor. Secondly, how expensive is it for the server to compute the next page?

Some candidates get frustrated and try to demand that the DB, IPC or API framework provide this consistent paging. That would indeed be wonderfully convenient but would imply complex integration between the datastore and the IPC layer — and the applications specific tradeoffs around showing stale data. Concretely, consistent paging is not offered by popular frameworks for these reasons.

3. Advanced: ranked distributed results. Many systems are too big to have the list of items stored in a single server. For example, Internet search nowadays involves interrogating many distributed processes — more prosaically a hotel booking website might ask for inventory from many providers. Rather than waiting for all to respond the client should be updated with the information at hand. Items can suddenly be discovered that should be at the top of the list, how should that be handled? Larger scale examples demand a co-ordination process on the server side that aggregates and then sends the best results to the client. The extent of co-operation with knowledge of client's state depends on the context. How should resources be allocated to this coordination process and how can it be timed out?

The question provides good leveling because it is implicitly required for many straightforward tasks (like providing a scrollable list in an app) but then is a key outward facing behaviour of large systems. In terms of computer engineering it touches on a range of knowledge in a practical setting: data-structures and algorithms to coordinate state. The client effectively has a partial cache of the results, and caching is known to be hard. Finally, the extension to distributed ranking should spur a mature discussion of tradeoffs for more sophisticated engineers.

If I was your candidate I'd have a lot of clarifying questions to ask... For the process itself though I'm wondering how much time you would give for this thing and how much you expect of abstract design vs actual pseudocode to demonstrate concurrency issues are understood. Would you leave time (or make time upfront) to see what sorts of tests they can come up with that could verify the desired behavior or leave QE assessment for another stage in the interview pipeline?

Maybe it would be better as a take-home assignment? Then you could discuss with the candidate their approach and what tradeoffs are involved, or guide them away from reaching for a nice guarantee in some library call.

I'd be interested in more blogs from you about interviewing in the industry, especially if you recall and how it seems for a lot of devs coding reasonably well defined things under pressure leads to really silly errors and panic. To me it seems like the best solution is an actual work-sample test that's not too long but not too short, representing actual specific work that's being done at the particular company rather than some proxy for work, but I'm limited in my ability to try different things besides the company standard for interviewing...

Posted 2017-07-08 07:36:24 GMT by Ember

Post a comment

Kotlin is a better Java

Posted 2017-03-12 23:00:00 GMT

The Kotlin programming language has all the features of Java, and supports all sorts of helpful things like extension functions, stackless inlines and named parameters. If you're starting a new project for the JVM, shouldn't you just use Kotlin?

In 2010, I asked the same question about Scala — the answer was no. Scala aims for a target that is not at all the practical minimalist Java. Instead, it's a hodgepodge of half-implemented ideas from C++ and Haskell. Astonishingly, a basic for loop in Scala is translated into a bytecode litterbug that spews garbage every iteration. Kotlin, on the other hand, keeps it clean and even makes it easy to invent new efficient iteration styles with guaranteed stackless inlining of lambdas (when annotated).

The features of Kotlin improve the life of a Java programmer. The language doesn't indulge in whimsical flights of fancy, like attempting to redefine what nullness is by inventing an Option datatype. The JVM has an efficient representation for a missing value: null. The only problem is that the Java language designers decided to crash the program by throwing a NullPointerException as the default response to this common condition. Objective C is much more sensible and just ignores operations on the nil object (though it does crash on NULL). Kotlin introduces syntax to keep track of what can be null, offers Objective C like behaviour with the ?. operator and provides :? to turn null into a default value — the Elvis operator. All in all, an elegant series of responses to the billion dollar mistake.

There are areas where Kotlin violates the Java principle that everything should be explicit: the most significant is extension methods, i.e. syntactic sugar to allow extra methods for other people's classes, and second with var (and val) to declare variables without repeating their type — like C++'s auto, Go's var, etc. Both features have been used successfully in C# for many years. In terms of understanding a program, these violations are less severe than defaulting all functions to be virtual — which Java started with — allowing child classes to confusingly modify their parent's intentions.

The Android Kotlin extension methods remove so much boilerplate and avoid the kind of nasty surprise that Scala introduces (the Kotlin runtime is under a megabyte). In the end, they make the intention of the programmer much clearer by skipping the ceremony imposed by badly designed APIs.

Java is deliberately not programmer-orientated. That's the point of using it — it was designed to restrict the kind of trouble programmers can get themselves into. If you're stuck with the JVM — and as more than 80% of smartphones run Android, we all are — the extra help that Kotlin affords in terms of null-checking and syntactic sugar actually make programs clearer and safer. The complicated excesses and action at a distance of C++ or Ruby are avoided. Admittedly, you can't write programs so succinctly as Ruby or with anything close to C++'s performance, but the bar we are evaluating is Java. Is Kotlin a better Java?

Yes. Yes, you should use Kotlin despite it being new, and despite the consequent teething problems (an example, promptly bug-fixed). The pace of improvement, incredible integration with IntelliJ (Android Studio) and great pragmatic design make it a winner: Kotlin swiftly (pun intended) to the rescue!

Post a comment

Recruiting software engineers and their CVs

Posted 2017-03-03 23:00:00 GMT

Having conducted hundreds of software and other interviews and trained many interviewers, I've seen a ton of CVs. The one thing that will more or less guarantee a good interview performance is a strong TopCoder record.

The ability to solve algorithm puzzles under stress and time pressure is exactly what the coding interviews are about, and TopCoder tests these abilities at the highest levels. After some point in the rankings, it isn't just into people who can think fast and solve puzzles. The best players train regularly and continually improve, and in fact have to put in incredible discipline to outperform very dedicated opponents. These engineers in the end have the staying power to solve very complex system problems and the flexibility to attack them with novel approaches where necessary.

Only a small number of people can be the best at TopCoder. A wider pool of engineers can do a good job. Did they do a good job in the past? Good CVs boast quantitative production or business metric improvements. Bad ones describe techniques applied.

Experience isn't equally granted over time: for example, an engineer can work for years on an implementation that never goes to production and not really learn anything and just get set in his or her ways. The more feedback an engineer receives, and learns from, the more experience they get. People who have taken charge and launched an app or maintained a popular open source project in their spare time might have more real technical experience than a tech lead at a big company.

Post a comment

A single point of failure is ok

Posted 2016-10-05 01:11:00 GMT

Making big systems out of many computers, people often end up with lower reliability than with a single computer. Also amusingly they may be slower. There's a big temptation to avoid a single point of failure, by introducing multiple points of failure - one computer is actually quite unlikely to fail, but with many failures are common. If one assumes that the failures are uncorrelated, and there's some way to transparently switch over, then having multiple machines might make sense and it's an obvious goal. Who wants to admit that a single hard drive breaking took down a big website for a few hours?

Embarrassing though it would be, in attempting to make it impossible for a single machine to take things down, engineers actually build such complex systems that the bugs in them take things down far more than a single machine ever would. The chance of failure is increased with software complexity and likely to be correlated between machines. Distributed systems are much more complex by their nature so there is a correspondingly high software engineering cost to making them reliable. With many machines, there are many failures, and working round all the complicated correlated consequences of them can keep a big team happily in work and on-call.

A typical example of adding unreliability in the name of reliability, is the use of distributed consensus - often embodied by Zookeeper. Operationally, if the system is ever mis-configured or runs out of disk space the Zookeeper will stop working aggressively. It offers guarantees on avoiding inconsistency but not achieving uptime so perhaps this is the right choice. Unfortunately, the Paxos algorithm is vulnerable to never finding consensus when hosts are coming in and out of action, which makes sense given that consensus needs people to stick around. In human affairs we deputize a leader to take the lead in times where a quick decision is needed. Having a single old-school replicated SQL DB to provide consistency is not hip but typically would get more 9s of uptime and be more manageable in a crisis.

It can be hard to grasp when trying to deal with heavily virtualized environments where the connection between the services and the systems they run on is deliberately weak, but there's often actually one place where a single point of failure is fine: the device the person using to connect to the system. And in fact it's unavoidable. After all, if the phone you're using just crashes then you can't expect to keep using a remote service without reconnecting. Other failures are less acceptable.

By an end-to-end argument the retries and recovery should therefore be concentrated in the machines the people are operating directly, and any other reliability measures should be seen purely as for performance. Simplicity isn't easy for junior engineers, eager to make their names with a heavily acronymed agglomeration of frameworks and a many tiered architecture - but it leads to really great results.

Post a comment

Bad unit tests impart a false sense of security

Posted 2016-06-21 10:45:00 GMT

Testing improves software. So much so that lack of unit tests is called technical debt and blanket statements from celebrated engineers like Any programmer not writing unit tests for their code in 2007 should be considered a pariah are uncontroversial. When a defect is noticed in software it's easy to say it could have been found by better testing, and often it's simple to add a test that would catch it's recurrence. Done well tests can be very helpful. However, they can also be harmful: in particular when they cause people to be overly confident about their understanding of the consequences of a change.

A good test
— covers the code that runs in production
— tests behaviour that actually matters
— does not fail for spurious reasons or when code is refactored

For example, I made a change to the date parsing function in Wine, Here adding a unit test to record the externally defined behaviour is uncontroversial.

Tests do take time. The MS paper suggests that they add about 15-35% more development time. If correctness is not a priority (and it can be reasonable for it not to be) then adding automatic tests could be a bad use of resources: the chance of the project surviving might be low and depend only on a demo, so taking on technical debt is actually the right choice. More importantly, tests take time from other people: especially if some subjective and unimportant behaviour is enshrined in a test, then the poor people who come later to modify the code will suffer. This is especially true for engineers who aren't confident making sweeping refactorings, so that adding or removing a parameter from an internal function is turned into (for them) a tiresome project. The glib answer is not to accept contributions from these people, but that's really sad — it means rejecting people from diverse backgrounds with specialised skills (just not fluent coding) who would contribute meaningfully otherwise.

Unit tests in particular can enshrine a sort of circular thinking: a test is defined as the observed behaviour of a function, without thinking about whether that behaviour is the right behaviour. For example this change I made to Pandas involved more changing of test code than real code that people will use. This balance of effort causes less time to be spent on improving the behaviour.

In my experience, the worst effect of automatic tests is the shortcut they give to engineers — that a change is correct if the tests pass. Without tests, it's obvious that one must think hard about the correctness of a change and try to validate it: with tests, this validation step is easy to rationalise. In this way, bugs are shipped to production that would have been easy to catch by just running the software once in a setting closer to the production one.

It's hard to write a good test and so, so much easier to write a bad test that is tautologically correct, and avoids all behaviour relevant to production. These bad tests are easy to skip in code review as they're typically boring to read, but give a warm fuzzy feeling that things are being tested — when they're not. Rather than counting the coverage of tests as a metric, we could improve it by using test coverage of the real code that runs in production. Unfortunately, these are not the same thing. False confidence from irrelevant tests measurably reduces reliability.

Post a comment

Java: What a horrible virtual machine

Posted 2016-04-18 02:36:00 GMT

JVM bashing is an activity rarely supported by facts. I don't actually know the details. I mean Java I really don't care about. What a horrible language. What a horrible VM. So, I am like whatever, you are barking about all this crap, go away. I don't care. This quote from Linus Torvalds upsets people — there is a school of thought that participation medals should be handed to everybody in the race and one should never be nasty at all, so all criticism is wrong, and one should never listen to it. Given that Linus himself is an exceptional technical leader started multiple huge billion dollar industries (Linux and Git) this attitude is extraordinarily arrogant. Is there another person whose technical opinion on these subjects one should respect more?

In its defense, an argument is advanced about the JVM: that it must be good, just because so many resources have been dedicated to it. Unfortunately, software doesn't work like that. Even experienced big software companies that are accustomed to managing big projects can pour billions of development dollars into duds and this Wikipedia list of failed custom projects is salutary reading. There are other VMs, and while the JVM makes bold promises and did have a brief competitive period, it is now effectively a monoculture around the Oracle implementation (OpenJDK).

Lack of dynamic memory allocation. When starting the Sun (Oracle) or OpenJDK JVM people pass a Xmx flag saying how much memory it should use. This is crazy: decades ago in FORTRAN people had to predeclare the maximum size of their datastructures. Dynamic memory allocation with malloc was a big deal (FORTRAN 90 standardized dynamic allocations). And it definitely makes sense: a program should scale its memory usage according to amount of memory it needs. Declaring the overall space usage is indeed better than going through and annotating each array but it's incredible to me that we are still discussing this in 2016. The default value is 256MB, which is crazily low given how memory hungry Java programs typically are (a text editor probably uses more), and insane running on a server with 256GB of RAM. The trouble with raising it, is that then by default the JVM will not worry too much about freeing up unused heap memory if it isn't close to its limit. There is this hilarious question on Serverfault where a poor ex-JVM refugee is introduced to the concept of dynamic memory allocation (default in .NET is 60% of RAM which is so, so, so much more sensible). There are smaller VMs that have this issue (Common Lisp SBCL, for example), but other VMs that try to employ sensible heuristics (like Haskell GHC, which just has a suggested heap size). There are indeed pros and cons to different approaches and a mature VM should implement sensible heuristics by default and allow configuration. The JVM does not even attempt the former.

Lacking ABI and poor bytecode design. The virtual machine lacks basic features like generics, unsigned arithmetic, has poor support for dynamic languages, lacks value types (huge performance issue), and so on. Some of these issues are being addressed, e.g. the invokedynamic op, but it's telling that JavaScript V8 can beat the JVM on some microbenchmarks.

Poor foreign function interface. Even small VMs like SBCL Common Lisp, or Haskell have high performance and easy interfaces to C code. JNA makes a better interface. Sharing datastructures back and forth with native code is a big deal and the CLR from Microsoft invests a huge amount of effort into PInvoke. The JVM should too!

Lack of performance isolation. Ideally, a VM would let you run untrusted code. This is what JavaScript VMs in browsers do really well. The CLR has a concept of AppDomain, sort of a .NET container, which can be configured to limit memory usage and other things. Even PHP lets you limit memory usage per request. In a multithreaded JVM application, one bad request can OutOfMemoryException other requests on the same machine and there's no way to stop it. You can't even track the memory usage of a thread in a multithreaded program. Also, the JVM does not allow fork()ing so you can't use the OS isolation.

Another issue, unfortunately without supporting links, is that in my experience, JVM deployments get stuck on old versions. Every enterprise I've worked in that uses Java, has had some specific old version of the JVM (sometimes, incredibly specific like 1.5.0_05), that they were stuck on and could not upgrade out of, causing the usual problems with not being able to use new tools. Almost always the version used would be no longer supported and weird installers for it would be stored in odd places. Upgrades are always hard, but this is something that FreeBSD, Linux and even Microsoft Windows operating systems do better, and Intel does better with real, physical machines. Virtual machines were sold as more flexible and manageable than physical ones! In my limited experience with it, Microsoft CLR does a much better job here. This is exactly something that one would expect a mature VM with big development budgets to really care about.

It's great that the JVM ecosystem is improving. Lambdas and invokedynamic are good steps forward; but we need more! The concept of a virtual machine promised so much, it's now hard not to find the Oracle VM disappointing.

Post a comment

Android app shenanigans in 2016

Posted 2016-01-25 03:56:00 GMT

Discussing privacy and apps, my friend Jinyang told me about study he'd worked on called Who Knows What About Me? A Survey of Behind the Scenes Personal Data Sharing to Third Parties by Mobile Apps. This made me curious about what my own phone was doing. Fortunately, on Android you can gain administrator access to your device (root) through semi-supported mechanisms, and then use standard Linux sysadmin tools to figure out what's going on. The excellent SSHelper by Paul Lutus allows one to login conveniently via ssh. It was snowing here in NYC so I had plenty of time over the weekend to dig in.

First, I went through my Android Google Play Store app history and tried to install all the apps I'd ever used, total around 400. I ended up with only 181 installed apps in /data/app though, and 48 in /system/app, as the Play store crashed a few times.

Then I had a look at what services were actively listening for network connections (by running netstat -l -p -W). These programs are waiting for external parties to connect to the phone in some way, great in the case of the SSHelper program that I installed, because that's exactly what I wanted it for, but other programs are doing it without my consent and it's unclear for whose benefit.

Disabling information leak from Samsung SAP on port 8230. There was also a listening on port 8230. Turns out that this service is related to my Samsung watch, and if you connect to the port it'll give the model of my phone without authentication: XT1575;motorola;Moto X Pure;SWatch;SAP_... — given that the Samsung software running on the watch is written so sloppily that you sometimes have to reboot it to see the correct time, and the watch is set to connect via Bluetooth, I don't want to let anybody on the Internet have a go at vandalising my phone through this unnecessary service. Pretty easy to disable by running su iptables -A INPUT -p tcp --dport 8230 -m state --state NEW,ESTABLISHED -j DROP on the phone. This doesn't seem to affect the behaviour of the watch.

Local Facebook HTTP servers. There are two servers running on the phone from Facebook main app and Messenger, on ports 38551 and 38194 claiming to be GenericHttpServer. These are only accessible to apps on the phone. I won't comment more on these as I used to work at Facebook.

Local Android services. There are several processes like the Android debugging daemon running locally on port 5037, and the Low Memory Killer Daemon, and the Zygote app starting daemon and so on listening on UN*X sockets.

To see traffic lists, I ran grep [0-9] /proc/uid_stat/*/* after a reboot to dump the traffic usage. The uids can be linked to apps via /data/system/packages.xml, which I did via a quick Python script. There are some uids shared between packages. Oddly enough, my LIFX light app seemed to be all over the Internet. Snapchat was using the most data but I have fairly active account (@vii) that's open to non-friends so please message away. Another heavy app was S Health, especially annoying as I had turn off sync for it in settings. Also the id shared by,,, was very active. Looking at netstat -p -W showed in regular contact with Google IPs ( I set up traffic dumps from mitmproxy which showed polling of Google servers apparently about the location service and checking login status on

Stop apps running in the background unless they benefit you. The practice of many apps, even from fairly reputable companies, like the Amazon Shopping app, the Bloomberg app, the Etsy app, etc. to wake up and start using the Internet in the background is very damaging to battery life. These apps are communicating for their own interests, not mine, as far as I can see. The general pattern is to send up as much as can be gleaned about your phone as possible (for example, the Kindle app sends up tons of OpenGL information) — great for developers to understand their app install base. It's easy and convenient to crack down on them with the Greenify app, which unfortunately is an app and does its own tracking (quis custodiet ipsos custodes?). However, from the command line the dumpsys power command shows the apps busy in the background or holding wakelocks so you can do it by hand if you want.

The main contribution from the original paper that Jinyang co-authored was an analysis of the sorts of information that apps shared to their owners. It seems his methodology did not allow identifying which apps were responsible for the network traffic and indeed this is theoretically hard because an app can ask another app for something, but it's at least possible to figure out the app that made the network call. This can actually be done quite robustly and unintrusively with Android and iptables, by giving each app (uid) a separate IP address: use ifconfig wlan0:$uid $uid_ip to create an IP address for the uid, iptables POSTROUTING SNAT --to-source $uid_ip to mark traffic as coming from that IP. Unfortunately, this is was a little fiddly because I never mirrored the setup to IPv6 (just disabled IPv6 via /proc/sys/net/ipv6/conf/all/disable_ipv6).

Looking at a few games, they would eat a surprising amount of traffic. For an example, RopeFly used >50MB just starting up, asking for assets, a plethora of tracking feedback links for and then downloading a ton of video ad content from cloudfront, which it didn't show me.

My investigation was done over the snow weekend in New York, and there's obviously a lot more to dig into here: to watch more apps over a longer time with the one IP per app tracing, to use an mitmproxy like tool with support for SPDY and HTTP/2, and to disentangle some obvious shenanigans (for example, Foursquare was using some sort of obfuscation for its logs).

Despite having been involved in mobile app development for years, I was very surprised at how battery and data unfriendly popular apps are. The scheduled polling and dumping of device state might be convenient for managing the operational aspects of an app, but cost the install base battery life and mobile data — the tiny data caps even on unlimited lines in the US makes the second a real issue despite the low traffic cost to the people receiving the tracking data. After installing the apps, my phone heated up and my battery drained incredibly fast (almost as bad as the old days with an iPhone 5) but the battery tracking in the Android settings menu was very slow to assign blame to any culprit and hugely underestimated the overall impact they had.

Some ideas for our friends working on the Android platform (and of course, huge thanks to them for bringing Linux to our pockets):

— more aggressively attribute the battery cost for using mobile data connections and keeping connections open (seems to be accounted under non-app headings now);

— attribute the battery cost for apps that use wifi while not charging;

— all that's difficult: why not, by default, prevent apps from waking up in the background without the user's explicit consent? This should be a big permission with an easy toggle. There are a few apps that improve the user experience from this, like podcast downloaders (and that's great). Most apps don't. Until then, I guess we can install Greenify.

Let me know your tips, tricks and Android app advice! My phone is back to a reasonable temperature now — but what have I missed?

Post a comment

Older entries (102 remaining)