It looks like you're new here. If you want to get involved, click one of these buttons!
I don't have such a transformation handy, but I'd sure like one!
Some background: As Robert Figura described to us in his wonderful post, History of Databases, software developers/engineers/architects regularly distinguish between SQL and NoSQL databases. An SQL database (like MySQL, PostgreSQL, Oracle, SQL Server, etc.) is a relational database management system expressing its querying interface via the Structured Query Language (SQL), a well-defined standard. A NoSQL database (like MongoDB, DynamoDB, Cassandra, Redis) does not (necessarily) use SQL to work with data; I think you might say, rather than say there's no well-defined standard, that there's a multitude of "standards" available for these database engines. Redis, for example, has a very well-defined Redis Serialization Protocol (RESP) for working with data -- it's just not christened with an ISO or ANSI specification for use among multiple database vendors, although other vendors could if they wished. Suffice it to day, the SQL ecosystem consists of many vendors implementing the same (or close to the same) standardized language and the NoSQL ecosystem consists of many vendors implementing many languages for data storage and querying.
Here's my question: Given a schema for an SQL based RDMS, what's a natural transformation to a schema for a document based store? My instinct as a programmer would be to identify some kind of interchange format for records SQL<->NoSQL. One easy such format may be JavaScript Object Notation (JSON). A recent favorite of mine, protocol buffers, fits the need. Dare I mention the extensible markup language (XML)? But do any of these languages, used as an interchange format from SQL to NoSQL, have a precise mathematical expression in the language of category theory?
Comments
I do not know of any. But I also do not think can be done 'declaratively', any time soon. One would need a library of higher order functions (functions that produce functions) as well as domain/business specific features. Sort of like there is not one 'statistical model' that works for all cases, I think the same problem is here.
That's because, in one schema, there are several 'areas' or 'dimensions' of data model design considerations are expressed: ( read access optimization, write/update optimization, access control dimensions, allowances for new data attributes and their relations, enforceable constraints, logical constraints, time dimension).
So to create something 'equivalent' and formally provable transformation, one would have to come up with a way to preserve capabilities in all of the above mentioned dimensions.
For that to happen, we would have to have a formal definition for each of the above dimension, that would describe in quite a bit of details an 'algebra' applicable to each area. Then, same thing for the target data model, and, then, for the the transformations themselves.
Each of those things, I are not, as far as I now, well defined formally. Instead those are 'crafted' so to speak but data modeleres/architects/etc. (and of varying quality, I might add).
Probably too pessimistic, but for me it feels like -- in this area, we, in general industry, are still using abacus, but to do what you are asking (and many other similar in industrial-scale data transformations) -- we need at least a 486 CPU.
I do not know of any. But I also do not think can be done 'declaratively', any time soon. One would need a library of higher order functions (functions that produce functions) as well as domain/business specific features. Sort of like there is not one 'statistical model' that works for all cases, I think the same problem is here. That's because, in one schema, there are several 'areas' or 'dimensions' of data model design considerations are expressed: ( read access optimization, write/update optimization, access control dimensions, allowances for new data attributes and their relations, enforceable constraints, logical constraints, time dimension). So to create something 'equivalent' and formally provable transformation, one would have to come up with a way to preserve capabilities in all of the above mentioned dimensions. For that to happen, we would have to have a formal definition for each of the above dimension, that would describe in quite a bit of details an 'algebra' applicable to each area. Then, same thing for the target data model, and, then, for the the transformations themselves. Each of those things, I are not, as far as I now, well defined formally. Instead those are 'crafted' so to speak but data modeleres/architects/etc. (and of varying quality, I might add). Probably too pessimistic, but for me it feels like -- in this area, we, in general industry, are still using abacus, but to do what you are asking (and many other similar in industrial-scale data transformations) -- we need at least a 486 CPU.