Friday, October 19, 2007

Semi-structure ideas: some better definitions

I took a look back at my post yesterday and realized that my subject really needs some context to give it substance.

What scenarios define semi-structural data?
Perhaps different types of semi-structural data calls for different approaches to persistence. I'll try to define the gamut of types I've observed. How does the structure of data change with respect to time?

Dynamic Attributes - This is far and away the most common phenomenon of semi-structured data. The classic scenario is: the client needs you to add/remove attributes on to an existing data structure to satisfy their particular needs. Most of the modern software best-practices for maintainability are founded around this scenario. The key to dynamic attributes is that the entities adhere to a central structural definition, the entities are defined by their central definition, like all objects adhere to classes. When the data structure changes, it is commonly an expansion, not a removal, and to maintain a uniform definition, all legacy data is filled with default values to accommodate the change. If the frequency of structure change is rapid, developers tend to use the relational database in creative ways.

Dynamic Entities - Instead of dynamic structure definitions, we can also have dynamic structures. An example of this is user-defined objects, or dynamically created tables. If the structures are to be dynamic, the relationship between the structures must also be dynamic as well. One thing dynamic entities and dynamic attributes have in common is that every entity is defined by its schema, much like the object/class relationship.



Diverse Entities - This scenario is true semi-structural data, where entities are dynamically defined, and have little or no relation at all to a central schema definition. A common place you will see this is full-text indexing on documents; each entity is a document, they have JSON or URL-based relationships to one another, but each document is unique. The only thing all documents are guaranteed to have in common is that they are documents. What I find interesting about this scenario is the limited adherence to a rigid central definition. What if entities had limited adherence to multiple central definitions at the same (for example, the entity is slightly like the definition of a sales item and slightly like the definition of a physical location, such that the entity can rightly be used in either scenario?)

At the moment, this is an unfinished post, I plan on discussing common and creative ways people currently satisfy these scenarios in the relational database (even though I suspect there is a better way).

0 Comments:

Post a Comment

<< Home