In a conversation with Mike Hoskins, the CTO of Pervasive, we jointly agreed that relational databases, as they currently function will eventually be replaced. There are many reasons why. Here’s a list of some that came up in the conversation and some that I’ve added on for the purposes of this posting.
1. CRUD is crud. The fundamental activities of Create, Read, Update and Delete are fundamentally wrong-headed. Updates to data should never be allowed, because they destroy information and Deletes should never be allowed for the same reason. When data changes the database should just add another record and note (by any mechanism that works) the previous value for that record. If the database gets too large then data should be archived from it. The right set of activities is thus CRA, Create, Read, Archive. All databases I’m aware of implement a CRUD model. The closest to a CRA model is/was Interbase (now called FireBird, and available under open source) in that its underlying engine was designed to work in a CRA manner. Because of CRUD, relational databases have to do truly inconvenient things and, into the bargain, the audit trail of data is lost – or has to be tacked on as an afterthought in a peripheral and highly inefficient way.
2. The Relational Model was always a partial model. Relational database with its table-based normalization approach to data has always been a limited model. Certain data structures such as recursive structures (as in a part-explosion) or an ordered list or even a circular route simply could not be represented in a relational database without jerry-rigging the data, or the programs, or both, in some way. It wasn’t a show-stopper, but it wasn’t ideal. The problem of dynamic (i.e. changing) meta data was also a problem that only ever gave rise to jerry-rigged solutions. It would be nice to move to a more comprehensive model (such as, for example, is offered by object database products like Intersystems’ Caché, or Progress’ ObjectStore.)
3. The Logical/Physical separation of concerns was ignored. The Relational Model of Data was always a logical model. For those “data pools” to which it could be accurately applied, it was an excellent model that has proved to be highly successful and which has promoted a very easily understood query language; SQL. However, most relational database products implemented the relational model at the physical level as a set of btree files with multiple indexes. This works well enough for data held on spinning disks, but it imposes an artificial overhead onto managing the database. You need to have DBA’s who create and drop indexes for no reason other than getting the database to perform. There was no reason why more versatile physical engines could not have been built.
4. It’s over for spinning disks. All “old style” databases and relational databases assume that data comes from a spinning disk. That’s the wave of the past and you can wave goodbye to it. Those databases hark back to the time when memory was a very scarce resource. Nowadays memory costs about $15 per gigabyte, so there’s no real reason not to have a memory-based architecture for databases up to tens or even hundreds of gigabytes in size. You could hold the database in memory and replicate to a second site for back-up and disaster recovery – and the database would be a hundred times faster. Companies will start building products like this soon, if they are not doing so already. (Spinning disks are necessary for much larger databases and for archiving, of course).
5. Relational database never really did perform. Despite the vast amount of money invested in database benchmarks and the attendant publicity over the years, the simple truth is that relational database never really did perform well. All that happened was that Moore’s Law made hardware run faster and memory grew in size and ran faster and that made the same old engines appear to improve significantly. In some areas there were gains (caching strategies improved, and bit-mapped indexes – an idea that preceded relational database by many years – were added) but the underlying engines never changed much. They were always a little ponderous and they still are, it’s just that they ponder swiftly thanks to the silicon.
6. The big databases are too big. Once you get into the multi-terabyte and petabyte size databases – usually query-based data warehouse applications – you run into another problem with the old relational database engines. They are too slow for long queries that dive into big pools of data. They were never designed for big pools and they don’t perform well. You need purpose-designed parallel processing engines to hack those big databases.
Now, if you’re thinking to yourself “but those relational databases run most of the applications in the world and this means that, in time, they are going to be replaced” then you’re thinking what I’m thinking.