John Fremlin's blog: Pagination, a great software interview question

Posted 2017-05-05 22:00:00 GMT

Experienced, highly productive, senior engineers at companies like Google have told me they take weeks to prepare and practice for interviews as they don't use those skills in their jobs. Brainteasers are a complete waste of time but it's hard to find questions which are relevant to the job without being overly specific.

However, pagination is a great question: from ranked search to simply retrieving a list from a database, cross-machine APIs are very inefficient if they try to gather and send everything at once or just one thing at a time; instead they should fetch items in batches (an obvious but impractical extension of this idea is the lambda architecture). Pagination is great as an interview question because it is clearly related to the job, trips up software engineers regularly, and can be asked in calibrated stages. A huge amount of software engineering involves cross machine APIs nowadays, so the question is not specific to one area.

1. Screener question: API for fetching in pages of a given size from a fixed list from a single server. If the candidate can't do this then he or she will almost certainly struggle to implement any sort of API. Pivot to ask another simple question (like: intersect two lists of personal contact information), to assess whether the candidate was just confused by the framing.

2. Main question: API for fetching pages in a given size for a list from a single server where items can be deleted or inserted while the fetch is happening. It's helpful to give some business context: an extreme case is a list of messages in discussion forum app that supports deletes. This forces the solution away from the obvious idea of keeping a consistent snapshot of items for each client on the server. The client has state from previous pages that it can send to the server.

Once the candidate can solve the problem even if things are deleted or inserted at inconvenient times, to assess the quality of the solution: ask how much information needs to be communicated each fetch? Trivially, the client could send back everything it knows but that destroys the benefit of batching. Ideally, the client would send back a fixed size cursor. Secondly, how expensive is it for the server to compute the next page?

Some candidates get frustrated and try to demand that the DB, IPC or API framework provide this consistent paging. That would indeed be wonderfully convenient but would imply complex integration between the datastore and the IPC layer — and the applications specific tradeoffs around showing stale data. Concretely, consistent paging is not offered by popular frameworks for these reasons.

3. Advanced: ranked distributed results. Many systems are too big to have the list of items stored in a single server. For example, Internet search nowadays involves interrogating many distributed processes — more prosaically a hotel booking website might ask for inventory from many providers. Rather than waiting for all to respond the client should be updated with the information at hand. Items can suddenly be discovered that should be at the top of the list, how should that be handled? Larger scale examples demand a co-ordination process on the server side that aggregates and then sends the best results to the client. The extent of co-operation with knowledge of client's state depends on the context. How should resources be allocated to this coordination process and how can it be timed out?

The question provides good leveling because it is implicitly required for many straightforward tasks (like providing a scrollable list in an app) but then is a key outward facing behaviour of large systems. In terms of computer engineering it touches on a range of knowledge in a practical setting: data-structures and algorithms to coordinate state. The client effectively has a partial cache of the results, and caching is known to be hard. Finally, the extension to distributed ranking should spur a mature discussion of tradeoffs for more sophisticated engineers.

If I was your candidate I'd have a lot of clarifying questions to ask... For the process itself though I'm wondering how much time you would give for this thing and how much you expect of abstract design vs actual pseudocode to demonstrate concurrency issues are understood. Would you leave time (or make time upfront) to see what sorts of tests they can come up with that could verify the desired behavior or leave QE assessment for another stage in the interview pipeline?

Maybe it would be better as a take-home assignment? Then you could discuss with the candidate their approach and what tradeoffs are involved, or guide them away from reaching for a nice guarantee in some library call.

I'd be interested in more blogs from you about interviewing in the industry, especially if you recall http://john.freml.in/codejam-2009-round2 and how it seems for a lot of devs coding reasonably well defined things under pressure leads to really silly errors and panic. To me it seems like the best solution is an actual work-sample test that's not too long but not too short, representing actual specific work that's being done at the particular company rather than some proxy for work, but I'm limited in my ability to try different things besides the company standard for interviewing...

Posted 2017-07-08 07:36:24 GMT by Ember

Post a comment