John Fremlin's blog: The bonds of SQL

Posted 2015-06-17 05:44:45 GMT

Question for a SQL test: write a query to return the top five sales for each day in the database? It's easy to express this query for a given day and many databases have extensions for writing this query - but it can't be expressed portably in standard SQL. And this query falls squarely into the core use-case that SQL is touted to solve.

The No-SQL key-value store movement exemplified by databases like MongoDB is often lambasted for ignoring the lessons of history. SQL, a venerable ANSI standard, represents that history and provides a well known language and a protocol to more or less decouple the application from the database implementation. People with diverse roles and backgrounds interact with SQL and are well-versed in its peculiarities: from analysts to database administrators to web front end developers.

Despite this, for another example, there is no simple query that can 'insert this value for a key or update that key if already present' atomically. Some SQL implementations provide extensions for this elementary and very common task (like MySQL's ON DUPLICATE KEY UPDATE), and it is possible with stored procedures at risk of losing performance to exception handling.

SQL is designed for 'relational' databases: that is, each row in a table expresses a relation and so must logically be unique. The adherence to this concept is why SQL cannot answer the simple sales query, and why No-SQL databases are justifiable not just on grounds of performance and scalability: they often fit the problem domain better. When a design requirement fails to fit the use case it should be re-evaluated: relational databases are very handy for some sense of purity but as systems like Hive demonstrate, things more or less work without pure relational semantics.

SQL imposes weird design constraints on a general purpose database: people add dummy 'id' columns to give each records a relational uniqueness. As with all failed designs it's true that relational databases have real advantages in many cases, but the choice to demand these strict semantics should lie with the user, and a standardisation of the syntax for avoiding them would mean that SQL could deliver on its promise of portability across database implementations.

We all benefit from a common language, and tying SQL to one database implementation dogma inspires the proliferation of No-SQL mini-languages, each with a learning curve and lacking features. It's time to wrest the familiar syntax from the constraints of an ultimately failed design and admit that non-relational No-SQL techniques have real benefits to deliver.

Post a comment