In _"Lecture 39 - Chapter 3: Databases"_ I offered to _tell a bit about why SQL came into existence, and what modern relational databases have to offer_. Here is my very short intro to what I think are important bits of real-world databases.

First a little bit on the history of relational databases. Edgar F. Codd coined the term "relational database" in his 1969 theory paper _"A Relational Model of Data for Large Shared Data Banks"_ to capture what people were doing at the time, with a neat formalism to describe tables of data:

By the way, here is a nice obituary about Codd, likely written by one of his colleagues, now part of IBM's "history of progress" thread:

Ooh, and also notice the other famous names in the "The Builders details" menu on the right hand side of that page! But I digress...

The references from Codd's 1970 paper mention some of the most prolific or important structured data storage systems of the era ("popular" might be a strangely unfitting term here). One notable example is Charles Bachman's "Integrated Data Store" (IDS) which he designed in the early 60's, and got Bachmann his Turing award in 1973:

Soon thereafter, Codd, together with Raymond F. Boyce and Donald D. Chamberlin began working on the "Structured English Query Language" or "SEQUEL", the predecessor of SQL. The first working implementation came 1979 by what would later be known as Oracle, and IBM strived to put what came out of its System R prototype to work everywhere. SQL became standardized in 1986.

But why did they need to invent SQL in the first place, what was the drive behind it?

First came data structures with O(log n) complexity for lookup and insertion (e.g. AVL, or red-black trees). Then people came up with variants which aimed to maximize density, or to minimize the number of disk accesses, or seeks (e.g. radix Trees, B-Trees). So now that there was efficient indexing, people could start building humongous databases, you merely had to combine querying the indices in a sensible way...

To achieve this, The data operator (a person) would write a query plan, for example in the CODASYL Data Manipulation Language, which told the database management system how to combine the various indices, hopefully in an efficient way. This was rather tedious, error prone, and sometimes very hard to debug or extend. See here for a quick impression:

Or read this nice long interview with Donald D. Chamberlin (or just peek at page 8):

If you're as ancient as I am, you might remember dBase for CP/M, Apple II, or DOS. Or maybe FoxBase+. Or Clipper, which had data plans with a programming language around it.

However, the solution to this mess was to keep the technology, and merely put a better language on top. SQL is a compiled language to create data plans from rather concise descriptions. Modern databases like MySQL or Postfix will still let you look at the generated data plans!

But nowadays we want even more from a database management system. I won't get into distributed, geographical (geometric), or graph databases, I mean transactional qualities: We want to make sure that we never lose any data! The relevant keywords here are _atomicity_, _consistency_, _isolation_, and _durability_ (ACID).

Getting this right is immensely difficult, especially in light of the complexity of filesystem drivers (heck, scheduling and other properties of the operating system), as well as the underlying hardware. All of the NoSQL systems might still be too young to excel at this (sorry for the pun), and many of the smaller SQL implementations might also come with compromises.

By the way, the whole problem area is close to the issue of distributed databases: how to build a perfectly reliable one? Spoiler: you can't: see the CAP theorem:

This is what I have for you today. One could easily fill books with this stuff, and I'm certain some authors did. I would love to hear about good books, or more on the history of modern RDBMS!

Some more links:

Who is also known for Bachman diagrams: