Thursday, October 18, 2007

Persistence of Semi-Structural Data

Earl - "Look at the snowflakes, someone once told me, no two snowflakes are the same"
Mooch - "Earl, no two anythings are the same!"
-Patrick McDonnell (Mutts Comic)

It amuses me when we set our own limits around our abilities: how often we place arbitrary or imaginary boundaries on what can and cannot be accomplished. When something that was originally assumed impossible is now proven possible, it suddenly becomes easy to do and simple to understand.

One of the boundaries Ive been witnessing is the constrained form of the relational database. For performance and traditional reasons, the schemas we work with are static and rigid. Furthermore, we use databases and software to track and model real-life phenomena. The problem is, unlike our databases, real life is rarely static and rigid; it is dynamic, it is diverse, it is in flux, never the same. Its a shame when people work in a fashion that is dictated simply by the limits of their software, and not vice versa isnt it? We as computer scientists are forced to be creative to find ways to represent the dynamic phenomenon in the static fashion by using as many commonalities as we can. In a sense, we try to make the static representation as dynamic as possible. That is our Art as I see it anyways.

Why this limit on the databases ability? Why can't I find some sort of a persistence medium that works with dynamic (semi-structural) data? I read this article from a very bright developer ayende rahien who explores the use of lucene as a semi-structural persistence layer. I explored alternative approaches using couchdb, but all of these have the sensation of trying to pound the square peg into the round hole. Dynamic indexing seems to be a requirement for flexible querying, but these systems are designed for document persistence, not an object persistence layer. I would assume the cost of dynamic indexing causes a ding in scalability.

Still, when the structure of our data becomes dynamic, do we fundamentally lose the ability of flexible queries and indexing performance, or is this notion of a rigid schema an imaginary boundary?

0 Comments:

Post a Comment

<< Home