Wednesday, December 19, 2007

Generic Custom NHibernate Collections - A Second Swing

I talked about custom collections for WPF and NHibernate back here, but I wanted to mention that I made an alternative solution that has less lines of code and it is apparently easier for other people to understand.

A quick recap: We want to harness the powerful databinding features of WPF. To optimize the two-way binding functionality, our objects need to implement INotifyPropertyChanged. No problem there, but our collections need to implement INotifyCollectionChanged, which is problematic, because our collections are commonly IList(T)s.

Why does NHibernate use an IList? When declaring a transient (new) object, we always write something such as the following:

private IList InnerType m_InnerItems = new List<InnerType>();

A "transient" collection is a new or unsaved collection that was created in your code, and the concrete implementation of "IList(T)" is a "List(T)". A persistent (saved) object is not built by your code, it is built by NHibernate. It is still an "IList(T)", but the concrete implementation is a PersistentGenericBag(T).

The PersistentGenericBag class has no default constructor, it requires an ISession as a construction parameter to support the "Lazy-Loading" magic. Since PersistentGenericBag has no default constructor, it wasn't designed for us to use in transient collections. Besides, why would we want to use an NHibernate-implementation-specific type inside of our domain objects? That would couple our domain objects too tightly with NHibernate specific implementation, in my opinion.

What to do? We need to make a new interface (I defined mine to implement INotifyCollectionChanged for my uses, but this could implement anything you need for your purposes):

using System.Collections.Generic;
using System.Collections.Specialized;

namespace NotifyingCollectionDemo.Library.Collections
{
public interface IDomainCollection<T>:INotifyCollectionChanged, IList<T>
{
}
}

We need to define a new "Transient" collection type for our interface:

using System.Collections.Generic;
using System.Collections.Specialized;

namespace NotifyingCollectionDemo.Library.Collections
{
public class TransientDomainCollection<T>:List<T>, IDomainCollection<T>
{
#region INotifyCollectionChanged Members
public event NotifyCollectionChangedEventHandler CollectionChanged;
/// <summary>
/// Fires the <see cref="CollectionChanged"/> event to indicate an item has been
/// added to the end of the collection.
/// </summary>
/// <param name="item">Item added to the collection.</param>
protected void OnItemAdded(T item)
{
if (this.CollectionChanged != null)
{
this.CollectionChanged(this, new NotifyCollectionChangedEventArgs(
NotifyCollectionChangedAction.Add, item, this.Count - 1));
}
}
/// <summary>
/// Fires the <see cref="CollectionChanged"/> event to indicate the collection
/// has been reset. This is used when the collection has been cleared or
/// entirely replaced.
/// </summary>
protected void OnCollectionReset()
{
if (this.CollectionChanged != null)
{
this.CollectionChanged(this, new NotifyCollectionChangedEventArgs(
NotifyCollectionChangedAction.Reset));
}
}
/// <summary>
/// Fires the <see cref="CollectionChanged"/> event to indicate an item has
/// been inserted into the collection at the specified index.
/// </summary>
/// <param name="index">Index the item has been inserted at.</param>
/// <param name="item">Item inserted into the collection.</param>
protected void OnItemInserted(int index, T item)
{
if (this.CollectionChanged != null)
{
this.CollectionChanged(this, new NotifyCollectionChangedEventArgs(
NotifyCollectionChangedAction.Add, item, index));
}
}
/// <summary>
/// Fires the <see cref="CollectionChanged"/> event to indicate an item has
/// been removed from the collection at the specified index.
/// </summary>
/// <param name="item">Item removed from the collection.</param>
/// <param name="index">Index the item has been removed from.</param>
protected void OnItemRemoved(T item, int index)
{
if (this.CollectionChanged != null)
{
this.CollectionChanged(this, new NotifyCollectionChangedEventArgs(
NotifyCollectionChangedAction.Remove, item, index));
}
}
#endregion

/// <summary>
/// we need to re-implement the IList methods to support observability
/// </summary>
/// <param name="item"></param>
#region IList<T> members
public new void Add(T item)
{
base.Add(item);
this.OnItemAdded(item);
}
public new void Clear()
{
base.Clear();
this.OnCollectionReset();
}
public new void Insert(int index, T item)
{
base.Insert(index, item);
this.OnItemInserted(index, item);
}
public new bool Remove(T item)
{
int index = this.IndexOf(item);
bool result = base.Remove(item);
this.OnItemRemoved(item, index);
return result;
}
public new void RemoveAt(int index)
{
T item = this[index];
base.RemoveAt(index);
this.OnItemRemoved(item, index);
}
#endregion
}
}

We need to define a new "Persistent" collection type for our interface:

using System.Collections.Generic;
using System.Collections.Specialized;
using NHibernate.Collection.Generic;
using NHibernate.Engine;

namespace NotifyingCollectionDemo.Library.Collections
{
public class PersistentDomainCollection<T>:PersistentGenericBag<T>, IDomainCollection<T>
{
#region constructors
public PersistentDomainCollection(ISessionImplementor session, IList<T> coll) : base(session, coll)
{
}
public PersistentDomainCollection(ISessionImplementor session) : base(session)
{
}
#endregion

#region INotifyCollectionChanged Members
public event NotifyCollectionChangedEventHandler CollectionChanged;
/// <summary>
/// Fires the <see cref="CollectionChanged"/> event to indicate an item has been
/// added to the end of the collection.
/// </summary>
/// <param name="item">Item added to the collection.</param>
protected void OnItemAdded(T item)
{
if (this.CollectionChanged != null)
{
this.CollectionChanged(this, new NotifyCollectionChangedEventArgs(
NotifyCollectionChangedAction.Add, item, this.Count - 1));
}
}
/// <summary>
/// Fires the <see cref="CollectionChanged"/> event to indicate the collection
/// has been reset. This is used when the collection has been cleared or
/// entirely replaced.
/// </summary>
protected void OnCollectionReset()
{
if (this.CollectionChanged != null)
{
this.CollectionChanged(this, new NotifyCollectionChangedEventArgs(
NotifyCollectionChangedAction.Reset));
}
}
/// <summary>
/// Fires the <see cref="CollectionChanged"/> event to indicate an item has
/// been inserted into the collection at the specified index.
/// </summary>
/// <param name="index">Index the item has been inserted at.</param>
/// <param name="item">Item inserted into the collection.</param>
protected void OnItemInserted(int index, T item)
{
if (this.CollectionChanged != null)
{
this.CollectionChanged(this, new NotifyCollectionChangedEventArgs(
NotifyCollectionChangedAction.Add, item, index));
}
}
/// <summary>
/// Fires the <see cref="CollectionChanged"/> event to indicate an item has
/// been removed from the collection at the specified index.
/// </summary>
/// <param name="item">Item removed from the collection.</param>
/// <param name="index">Index the item has been removed from.</param>
protected void OnItemRemoved(T item, int index)
{
if (this.CollectionChanged != null)
{
this.CollectionChanged(this, new NotifyCollectionChangedEventArgs(
NotifyCollectionChangedAction.Remove, item, index));
}
}
#endregion

/// <summary>
/// we need to re-implement the IList methods to support observability
/// </summary>
/// <param name="item"></param>
#region IList<T> members
public new void Add(T item)
{
base.Add(item);
this.OnItemAdded(item);
}
public new void Clear()
{
base.Clear();
this.OnCollectionReset();
}
public new void Insert(int index, T item)
{
base.Insert(index, item);
this.OnItemInserted(index, item);
}
public new bool Remove(T item)
{
int index = this.IndexOf(item);
bool result = base.Remove(item);
this.OnItemRemoved(item, index);
return result;
}
public new void RemoveAt(int index)
{
T item = this[index];
base.RemoveAt(index);
this.OnItemRemoved(item, index);
}
#endregion
}
}

Finally, we need an implementation of IUserCollectionType to tie this all together and use it in the mapping files. Notice how I treat this as a factory class:

using System.Collections;
using System.Collections.Generic;
using NHibernate.Collection;
using NHibernate.Engine;
using NHibernate.Persister.Collection;
using NHibernate.UserTypes;

namespace NotifyingCollectionDemo.Library.Collections
{
public class DomainCollectionFactory<T> :IUserCollectionType
{
#region IUserCollectionType Members
public IPersistentCollection Instantiate(ISessionImplementor session, ICollectionPersister persister)
{
return new PersistentDomainCollection<T>(session);
}
public IPersistentCollection Wrap(ISessionImplementor session, object collection)
{
return new PersistentDomainCollection<T>(session,collection as IList<T>);
}
public object Instantiate()
{
return new TransientDomainCollection<T>();
}
public IEnumerable GetElements(object collection)
{
return (IEnumerable) collection;
}
public bool Contains(object collection, object entity)
{
return ((IList) collection).Contains(entity);
}
public object IndexOf(object collection, object entity)
{
return ((IList) collection).IndexOf(entity);
}
public object ReplaceElements(object original, object target, ICollectionPersister persister,
object owner, IDictionary copyCache, ISessionImplementor session)
{
IList result = (IList) target;
result.Clear();
foreach (object o in ((IEnumerable) original))
{
result.Add(o);
}
return result;
}
#endregion
}
}

How to use this? In your mapping file, something such as:

<bag name="Items" inverse="true" cascade="all-delete-orphan" generic="true" lazy="true"
collection-type=
"NotifyingCollectionDemo.Library.Collections.DomainCollectionFactory`1[[NotifyingCollectionDemo.Library.DomainModel.ListItem, NotifyingCollectionDemo.Library]], NotifyingCollectionDemo.Library">
<key column="ListContainerID" />
<one-to-many class="ListItem" />
</bag>

In the code:

private IDomainCollection<ListItem> _items = new TransientDomainCollection<ListItem>();

public IDomainCollection<ListItem> Items
{
get { return this._items; }
set { this._items = value; }
}

And you should be in business.
I like this code, because the NHibernate-specific stuff is only accessible from the NHibernate-specific factory. The user code never references a PersistentDomainCollection, which makes for a clean cut. Again thanks to Billy McCafferty and Damon Carr, since my solutions are "cannibalizations" of their more original works. Any thoughts?

Labels: ,

Sunday, December 09, 2007

Linq to SQL vs NHibernate Part 1: What do they have in common?

Choosing a technology such as object persistence is one of the first steps in any major project, and it's a tough call to make. We spent some time at my company trying to figure out if Linq to sql was a better ORM than NHibernate. After some experiments, I came to the engineer's conclusion: It depends. As Linq gains popularity, people will be wondering the same questions, so I'm writing a few unbiased posts to sort out their differences (just as a warning in advance, my expertise is with NHibernate)

Crash Course:
Linq is the query syntax added to the C# language for 3.0. Trees, relational data, objects, xml etc can all be queried using the common Linq syntax reminiscent of SQL. It is easy and flexible, strongly typed, and compiled. Linq to SQL is a natural extension of Linq into an ORM, and it is touted as a "lightweight" data mapper, and heavily hyped by microsoft in previous beta versions. Linq to SQL is built specifically for sql server 2005 and above.

NHibernate is a mature open source project designed specifically to solve ORM problems. It is an extremely flexible and configurable ORM, and its been battle-proven for many enterprise projects. It is database agnostic, and supports a wide array of different database brands. Like many active open-source projects, it is undergoing constant evolution, which makes good documentation hard to find.

What do they have in common?

Mapping syntax
Any object/data mapping system is going to need object definitions and and a corresponding DDL. Both libraries are extremely flexible with their initial configuration, but there is a way to use both of them in a similar fashion.

Just like any good ORM, your objects are simply plain old objects that happen to be persistable as an afterthought. The object definition is any class file. The XML mapping gives the ORM library the links between objects and their tables. It defines the mappings between an objects properties and the columns in the database. It defines the relationship between objects (collections and encapsulation) and the corresponding data relations (many-to-many, many-to-one, one-to-one)

Both NHibernate and Linq accept these mapping files as arguments to their initialization.
Persistent Object Lifecycles
Scoping - Within a common scope, the loading of two instances of the same database row should yield two references to the same object. This scope in NHibernate is called a "Session", in Linq, it is called a "DataContext", and both guarantee reference equality between two instances of the same data under the same scope.

Version Management - Once you have loaded objects under a scope, you should be able to efficiently synchronize the database with the updated state of your objects. Linq's DataContext exposes a SubmitChanges() method for this very purpose, and NHibernate has a Flush() method.

Adjustable Fetching Schemes - Loading objects from the database is a bit of a catch-22, you don't want to load the entire database up in to an object graph, but you don't want a roundtrip to the database every time you need a new object. Both of the libraries support highly configurable lazy and eager fetching schemes. Both of them use lazy loading by default, and both use left-outer-join as a default behavior when eagerly fetching peripheral objects.

Concurrency Concerns - Enterprise data is volatile, and we need an ability to recognize and manage the scenarios when data is changed by external forces. By default behavior, NHibernate and Linq behave in an optimistic concurrency fashion, which basically loads rows without locking them, and throws exceptions if the objects you are saving have changed since you loaded them. Both libraries have multiple means of customizing concurrency behavior.

Custom Database Objects - There are some operations that are simply better off left to the database to perform, such as large scale "en-masse" updates and reporting. These operations are easily implemented as stored procedures or indexed views. Both libraries support the ability to interface with custom database objects. Linq has very strong integration with stored procedures and views, but it only works with sql server 2005 and above. NHibernate is database agnostic, but there code that references database objects is string-based, which makes the connection brittle in comparison.

Code Generators
Personally, I am not a fan of any code generator related to something as important as your DDL, but there seems to be a very big demand for code generation, and there are convenience tools for both ORMs.
Linq Visual Designer - Linq comes with a built-in Visual studio designer for Linq Objects. It looks just like the visual dataset designer, because it was built by the same guy who made the visual dataset designer. IMO, this designer is a great way to get you nowhere in 30 seconds. It is only useful for the most trivial of object graph complexities, it uses partial classes to separate the mapping code from your user code, and it is brittle code at best.
Linq SQLMetal - In terms of codegens, this is Linq's saving grace right here. Given a database connection, sqlmetal can generate clean code for the objects, mappings, or both, with an array of options for fine-tuning the output code.
MyGeneration - A free 3rd-party codegen that has DDL "templates" for both NHibernate and Linq (amnong many others). This is a great way to generate code if you have an existing database schema.
NHibernate SchemaExport - All of the codegens above deal with the conversion of an existing database schema into object and mapping code. SchemaExport goes in the other direction, building a database schema from the mappings. I spoke on SchemaExport in the past, I am a very big fan of this one.

Integration -
Since the two technologies are not necessarily in direct competition, the is currently a push to harness the power of the linq-style querying in to nHibernate. More about Linq for NHibernate can be found here, here and here.

This post covers some of the commonalities, and in the next few days, I'll be comparing some of the more important factors such as performance, flexibility, and usability.


UPDATE:
PerpetuumSoft is a 3rd-party company has filled the dire need for a database synchronization tool with Linq To Sql. Given your object model definitions in Lint to SQL, their Database Restyle application is a royalty-free component that gives you the essential ability to synchronize a schema from a changing object design, so you can design from a truly object-centric point of view.

Labels: , ,

Thursday, December 06, 2007

Unit testing Persistent objects in ORMs such as NHibernate

Even though I am writing this with NHibernate-specific examples, these concepts apply to any ORM technology, so use your imagination a little on this one.

So you're using persistent classes, and you need to make unit tests. Here is a fundamental concern to test with any persistent class:

I want to create an object, then "persist" it. I want to test that the object loaded from persistence matches the object I originally created. Then I have tested the correctness of the mappings.

How do you test to make sure that two objects match each other? I see two concerns:
  1. The immediate values inside of the object are matching (primitive properties, for example)
  2. The "neighbors" that my class has references to also match (many-to-one or one-to-many associated classes).
Consider this scenario: You have a target class X, and you want to test its persistence. Imagine that X inherits from an abstract class such as PersistentObject:
    public abstract class PersistentObject<T>
{
#region members
private int id = 0;
#endregion

#region properties
public int Id
{
get { return id; }
}
public bool IsSaved
{
get { return id != 0; }
}
#endregion

#region methods
public abstract bool Matches(T t);
#endregion
}
This is a very simplistic example, but in this case every PersistentObject has an integer identity primary key. Anything that inherits from PersistentObject must also implement a "Matches" method, which compares objects of a common type to see if the properties match.
I could have overridden the object.Equals method here, but I feel this would obfuscate the meaning of "Equals", so I made a new method.

This "Matches" method is the perfect hook to test the equality of immediate properties (persistent object testing concern #1).

Just because the immediate properties of an object match, does this mean that the objects are the same? No, we must make sure that the "neighboring" objects also match.

For concern #2, we could use a recursive approach, call the "Matches" method recursively on all of the parents and all of the children? I argue this is overkill.

If the target object's parents' IDs are matching, and the target object's childrens' IDs collections are matching, then we have effectively proven that the objects match.

If you have a test for every type of persistent object, then there is no need to call a "Matches" method recursively, because you will be testing the same thing many times over, and the code will be unnecessarily complex.

One important quality of NHibernate to remember: NHibernate guaranteess the reference equality between two objects representing the same row under the same session. This is a very simple, but powerful quality.

Here is the approach I've adopted for unit testing persistence:
  1. Create your database schema from the mappings using the schema export tool.
  2. Use a factory to create some bulk data. If you can, create a rich graph of all of your persistent objects in a realistic scenario.
  3. Using a new session, persist this object graph to the database.
  4. For each type of persistent object,
  • try to load a new copy of the object under a brand new session using the ID of one of your originally created objects.
  • Call the "Matches" method to compare the two objects. (concern #1)
  • Verify that the IDs of the parents/children of the original object equal the IDs of the parents/children of the loaded objects (concern #2)
Finally, tear down the database at the end.

Labels: ,

Wednesday, November 28, 2007

Open source enterprise sample using ActiveRecord

On the 29th, Ill be working an ACM lecture at UNH on the life of a computer consultant.

Something that is important to me: code speaks louder than words. Its cool to hear new ideas, new philosophies, but coded examples of these ideas backs them up with a little reality. The problem is, its hard to give code examples without lots and lots of surrounding context to make the code real.

I put together a new open source project on google code. It is a "zen-garden" style project for enterprise-level data applications using Castle ActiveRecord under the hood. I kick-started this project by giving it a foundation, a domain, some controllers, and a web-based user interface. Now it has some critical mass. The application itself is not important so much as the ability to practice and learn new skills by playing with the code.

This is the link to the code project homepage. Be sure to view the project wiki for the nitty gritty details. This is the link to the subversion trunk. To download and run the application you will need:
Downloading the Project Code
If you don't want to join the open source project, you can simply download the app from the download tab. Make sure you have winrar or some rar extractor equivalent.

Joining and Contributing to the Project
  1. Contact me and I will add you to the project as a user, so you can reside under the subversion source control system. Dont be shy, I promise I will get you everything you need.
  2. You need to get a google account. If you have a gmail email address, you have a google account. If you don't, then go and register. (Google accounts are very worthwhile to any internet-junkie like myself)
  3. Select a folder where you want to download the code.
  4. Using tortise svn, right click on the selected folder, and select ToriseSVN--Checkout


use this url for the project when prompted:
https://activerecordenterprisesample.googlecode.com/svn/trunk/



You will be prompted for a username. Use your google account username.
You will be prompted for a password. Use the google code password I gave you.
With that, you will become an active member of the project.

Getting the code to run
  1. Look at the EntityLayer project. Modify the App.config file, be sure to set the sqldialect so it matches your database. Modify the connectionstring so it can reach your database.
  2. Open the DatabaseSetup.cs class. See the test that creates an empty database? Execute it using testdriven-net by right-clicking on the [Test] keyword. This will create your database.
  3. Once this works, modify all of the *.config files in the project in the same manner.
  4. Set the WebLayer as the startup project, by right clicking on the project in the solution explorer. Hit F5, and you should have the website up and running.

Labels: , , ,

Tuesday, November 27, 2007

NHibernate:How to Build Great Mappings and use SchemaExport

NHibernate.Tool.hbm2ddl.SchemaExport: this thing is so cool its not even funny.

Consider the following code:

SchemaExport _schemaExport = new SchemaExport(new Configuration().Configure());
_schemaExport.Create(false,true);

Do you know what that does? It builds a database schema from the mapping files! The first parameter is an option to output the script to the console. The second parameter is an option to execute the script in the database.

All you need is an App.config such as:

<?xml version="1.0" encoding="utf-8" ?>
<configuration>

<configSections>
<section name="hibernate-configuration" type="NHibernate.Cfg.ConfigurationSectionHandler, NHibernate" />
</configSections>

<!--nhibernate configuration-->
<hibernate-configuration xmlns="urn:nhibernate-configuration-2.2">
<session-factory>
<!--the brand of dbms-->
<property name="dialect">NHibernate.Dialect.MsSql2005Dialect</property>
<!--connection provider-->
<property name="connection.provider">NHibernate.Connection.DriverConnectionProvider</property>
<!--sql client driver-->
<property name="connection.driver_class">NHibernate.Driver.SqlClientDriver</property>
<!--connection string-->
<property name="connection.connection_string">Server=localhost; initial catalog = YOURDBNAME; Integrated Security=SSPI</property>
<!--output generated sql to console window-->
<property name="hibernate.show_sql">true</property>
<!--the assembly with the embedded mapping files-->
<mapping assembly="[YOURASSEMMBLY without the .dll ectension]" />
</session-factory>
</hibernate-configuration>
</configuration>

If you don't have sql server 2005, change the dialect; NHibernate supports 99% of the popular modern databases. These are the steps you need:
  • Create an empty database using your DB server
  • Create NHibernate entities, create the mappings
  • Add the app.config listed above
  • Execute the code listed above
And there is your database. If you want to drop it at the end, simply call _schemaExport.Drop.

Do you know what this means? It means that my XML mapping IS my database schema.
You dont need to synchronize the database schema from the mappings if your database schema is CREATED FROM THE MAPPINGS!

If your mappings are the de-facto definition of your database, then you better have a very high degree of control over the database being built. Let's dissect a mapping file that is very conscious to database concerns:
<?xml version="1.0" encoding="utf-8" ?>
<hibernate-mapping xmlns="urn:nhibernate-mapping-2.2" default-lazy="true" default-cascade="all-delete-orphan">
<class name="acme.widget, acme" table="Widgets" >

Notice how you can set the default-lazy and default-cascade behavior of NHibernate on the class declaration? They define the general behavior of NHibernate in terms of loading objects and saving them, and changing these settings changes the behavior of your program considerably. Learn more about these settings here and here.

<!--primary key mapping-->
<id name="m_id" column="WidgetID" type="int" access="field" unsaved-value="0">
<generator class="identity" />
</id>
See the unsaved-value=0? This means any Widget object with an ID of 0 warrants an insert call when using the SaveOrUpdate method. Any Widget object with a nonzero ID warrants an update call. See the access="field"? This gives NHibernate the permission to manipulate the ID field through a private variable. This way, you don't need to expose the ID of an object as a public writable property.
<!--property-to-column mappings-->
<property name="WidgetName" not-null="true" type="string" length="30" unique="true" unique-key="myCustomUniqueKeyName" >
<column name="WidgetName" sql-type="varchar(30)" not-null="true" index="myCustomIndex" unique="true"/>
</property>
See how I add a <column> tag? the column tag gives me specifics about the sql data type of the column, the index, and allows me to set a named unique-constraint on the column (enforced by the database).
<!-- Many To One Relationships -->
<many-to-one name="Factory" column="FactoryID" not-null="true" not-found="exception" class="acme.Factory, acme" index="myIndex" foreign-key="myCustomFKName" />
</hibernate-mapping>
In here I explicitly named an index on this column called "myIndex". SchemaExport automatically builds the foreign key constraints for me, but I can give it my own name 'myCustomFKName'. I forced the property not to allow nulls (rarely do you need nulls) and to throw an exception if, for some reason, the relationship is broken. Redundant, perhaps, but there is nothing wrong with some redundancy to ensure the integrity of the model.

Using this kind of database-conscious mapping technique allows you to build a quality schema automatically from SchemaExport. I have challenged this SchemaExport with difficult schemas, single-table inheritance mapping, multi-table inheritance mapping, many-to-many mappings, and it never falls short.

Almost never. Here are a few examples where this is not enough:
  • Views, UDFs, stored procs and all of those other non-NHibernate related database objects
  • I want a clustered index on a foreign key, so all of the "child" rows are arranged physically adjacent with respect to the parent rows for faster disk access
Fortunately, SchemaExport gives us a great all-around solution for these kind of custom database concerns using the database-object tag...
<database-object>
<create>create view ....</create>
<drop>drop view....</drop>
</database-object>
You can also write a custom class to create the database object by implementing the NHibernate.Mapping.IAuxiliaryDatabaseObject class.

<database-object>
<definition class="MyViewDefinition, acme"/>
</database-object>


This also works for different dialect scopes using the dialect-scope parameter if you need to support different dbms's. all of this information can be found at the bottom of this page.

Bottom line is, the SchemaExport tool is a great shortcut to getting a database up and running from ntohing but the object code and some mappings. To upgrade the schema of existing databases without loss is still a tricky procedure, but this is great for things like:
  • Unit tests (destroy and rebuild a fresh data population with every test)
  • Early application development lifecycle (where the schema definition is in a high state of flux)
  • setup/deployment - creating a database schema within a setup deployment installation program is a snap with this tool

Labels: ,

Thursday, November 08, 2007

Architecture confusion with ORM

I read the most interesting post on this morning regarding ActiveRecord vs Objects written by Bob Martin. It was linked on InfoQ by Sadek Drobi. I personally think Bob was pretty far off the mark with his article, but I think I understand how to explain the errors in his conclusion.
First, some context. ActiveRecord is a design pattern by Martin Fowler which suggests how to correctly leverage ORM. Every object knows how to save, update, delete itself, etc. In the .NET realm, Castle ActiveRecord is a very compelling option to consider for persistence technology. As of the other week, it has been built on top of NHibernate 1.2, which means it is very fast, flexible, and supports generics. I'm currently playing around with ActiveRecord to build a sample app for an upcoming ACM lecture at UNH.

Anyways, I think I understand where Bob Martin is coming from, I once had a similar confusion, and I would like to clarify the incorrect points of this article. For starters, lets get to the heart of the problem:

The Active Record pattern is a way to map database rows to objects.

This statement is true. There is a dichotomy between the relational structures and the object oriented structures. Relational structures are stacks upon stacks of data, while objects can have intelligence and encapsulate the data. This is where the confusion begins:

From the beginning of OO we learned that the data in an object should be hidden, and the public interface should be methods.
In other words: objects export behavior, not data.
An object has hidden data and exposed behavior.

This is bizarre. Since when can't objects expose data AND methods?

In languages like C++ and C# the struct keyword is used to describe a data structure
with public fields.
If there are any methods, they are typically navigational.
They don’t contain business rules.

So, according to Bob, none of your data structures can have any business-logic intelligence.

Thus, data structures and objects are diametrically opposed.
They are virtual opposites.
One exposes behavior and hides data, the other exposes data and has no behavior.

OK now this sounds strange to me, but I think I understand where Bob was coming from when he came to this conclusion. In any data-oriented architecture you have the following inevitable concerns that need to be addressed:
  1. Domain/Entity definition (Bob would call this the definition of the data structures... you need to define the persistent objects)
  2. Data Access definition (Where do we actually implement "Save" Delete" etc...)
  3. Business Intelligence (Where to we define the procedures that manipulate these entities/domain objects?)
Lets try this architecture with some more tangible examples: it would be nice if we could simply separate each of these concerns into separate assemblies. Why? Separation is good, loose coupling makes your application more flexible, right?

In this UML diagram, the folders represent assemblies, and the arrows represent dependent references. Remember we cannot have two assemblies depend on each other, right?

The first assembly defines the persistent objects. The second assembly encapsulates the data access logistics. Since it depends on the object definition, it references the first assembly. The third assembly handles the business logic. It will need object definition and persistence functionality, so it depends on the other two assemblies.

Using this design, your objects truly become nothing but dumb data structures. This is what I think Bob was talking about. This should smell like an antipattern to any designer.

OK, we want our objects to have intelligence. Lets take a second swing. Imagine the business logic is inside of the persistent objects, but the data layer is in its own assembly.

Since the data layer depends on the object layer and vice versa, we will have to use some fancier tricks here. Consider defining the data layer interface inside of the entity layer, such that the entity layer can rely on an interface. Then, the data layer will implement the interfaces defined on the entity layer (hence the solid upward arrow). At runtime, the dependency on the data layer can be dynamically bound using dependency injection (hence the dotted arrow). The implementation specifics can be found here in Billy Mccafferty's article on NHibernate best practices.

Now you have objects and their intelligence in the same place, and the data persistence implementation is cleanly separated, so persistence-related concerns don't start to encroach in to your business logic. Still, this is not good enough for me. These layers look nice on paper, but in practice, there is a lot of work to keep the interfaces and the implementation correctly aligned. I think that unless your application is monolithic or database agnostic, the cost of this layering outweighs the benefits.

The third approach is more of a free-for-all. Everything is defined within the same assembly so there are no dependencies. It then becomes your responsibility as a programmer to correctly encapsulate and abstract away persistence-oriented specifics from your business logic. Are you up to the challenge? Perhaps this scenario will not work for everybody, but it seems to be a fair compromise between risk mitigation and simplicity to me.

Architecture can be confusing with ORM, since objects are closely related to their persistence concerns, but it CAN work! Don't decouple like a drunken sailor: ask yourself: Layers and abstraction are cool, but is this extra vestige really helping me more than it is hurting me?

Labels: