Posted 2017-05-05 22:00:00 GMT
Experienced, highly productive, senior engineers at companies like Google have told me they take weeks to prepare and practice for interviews as they don't use those skills in their jobs. Brainteasers are a complete waste of time but it's hard to find questions which are relevant to the job without being overly specific.
However, pagination is a great question: from ranked search to simply retrieving a list from a database, cross-machine APIs are very inefficient if they try to gather and send everything at once or just one thing at a time; instead they should fetch items in batches (an obvious but impractical extension of this idea is the lambda architecture). Pagination is great as an interview question because it is clearly related to the job, trips up software engineers regularly, and can be asked in calibrated stages. A huge amount of software engineering involves cross machine APIs nowadays, so the question is not specific to one area.
1. Screener question: API for fetching in pages of a given size from a fixed list from a single server. If the candidate can't do this then he or she will almost certainly struggle to implement any sort of API. Pivot to ask another simple question (like: intersect two lists of personal contact information), to assess whether the candidate was just confused by the framing.
2. Main question: API for fetching pages in a given size for a list from a single server where items can be deleted or inserted while the fetch is happening. It's helpful to give some business context: an extreme case is a list of messages in discussion forum app that supports deletes. This forces the solution away from the obvious idea of keeping a consistent snapshot of items for each client on the server. The client has state from previous pages that it can send to the server.
Once the candidate can solve the problem even if things are deleted or inserted at inconvenient times, to assess the quality of the solution: ask how much information needs to be communicated each fetch? Trivially, the client could send back everything it knows but that destroys the benefit of batching. Ideally, the client would send back a fixed size cursor. Secondly, how expensive is it for the server to compute the next page?
Some candidates get frustrated and try to demand that the DB, IPC or API framework provide this consistent paging. That would indeed be wonderfully convenient but would imply complex integration between the datastore and the IPC layer — and the applications specific tradeoffs around showing stale data. Concretely, consistent paging is not offered by popular frameworks for these reasons.
3. Advanced: ranked distributed results. Many systems are too big to have the list of items stored in a single server. For example, Internet search nowadays involves interrogating many distributed processes — more prosaically a hotel booking website might ask for inventory from many providers. Rather than waiting for all to respond the client should be updated with the information at hand. Items can suddenly be discovered that should be at the top of the list, how should that be handled? Larger scale examples demand a co-ordination process on the server side that aggregates and then sends the best results to the client. The extent of co-operation with knowledge of client's state depends on the context. How should resources be allocated to this coordination process and how can it be timed out?
The question provides good leveling because it is implicitly required for many straightforward tasks (like providing a scrollable list in an app) but then is a key outward facing behaviour of large systems. In terms of computer engineering it touches on a range of knowledge in a practical setting: data-structures and algorithms to coordinate state. The client effectively has a partial cache of the results, and caching is known to be hard. Finally, the extension to distributed ranking should spur a mature discussion of tradeoffs for more sophisticated engineers.
Post a comment