Wednesday, November 28, 2007

Open source enterprise sample using ActiveRecord

On the 29th, Ill be working an ACM lecture at UNH on the life of a computer consultant.

Something that is important to me: code speaks louder than words. Its cool to hear new ideas, new philosophies, but coded examples of these ideas backs them up with a little reality. The problem is, its hard to give code examples without lots and lots of surrounding context to make the code real.

I put together a new open source project on google code. It is a "zen-garden" style project for enterprise-level data applications using Castle ActiveRecord under the hood. I kick-started this project by giving it a foundation, a domain, some controllers, and a web-based user interface. Now it has some critical mass. The application itself is not important so much as the ability to practice and learn new skills by playing with the code.

This is the link to the code project homepage. Be sure to view the project wiki for the nitty gritty details. This is the link to the subversion trunk. To download and run the application you will need:
Downloading the Project Code
If you don't want to join the open source project, you can simply download the app from the download tab. Make sure you have winrar or some rar extractor equivalent.

Joining and Contributing to the Project
  1. Contact me and I will add you to the project as a user, so you can reside under the subversion source control system. Dont be shy, I promise I will get you everything you need.
  2. You need to get a google account. If you have a gmail email address, you have a google account. If you don't, then go and register. (Google accounts are very worthwhile to any internet-junkie like myself)
  3. Select a folder where you want to download the code.
  4. Using tortise svn, right click on the selected folder, and select ToriseSVN--Checkout


use this url for the project when prompted:
https://activerecordenterprisesample.googlecode.com/svn/trunk/



You will be prompted for a username. Use your google account username.
You will be prompted for a password. Use the google code password I gave you.
With that, you will become an active member of the project.

Getting the code to run
  1. Look at the EntityLayer project. Modify the App.config file, be sure to set the sqldialect so it matches your database. Modify the connectionstring so it can reach your database.
  2. Open the DatabaseSetup.cs class. See the test that creates an empty database? Execute it using testdriven-net by right-clicking on the [Test] keyword. This will create your database.
  3. Once this works, modify all of the *.config files in the project in the same manner.
  4. Set the WebLayer as the startup project, by right clicking on the project in the solution explorer. Hit F5, and you should have the website up and running.

Labels: , , ,

Tuesday, November 27, 2007

NHibernate:How to Build Great Mappings and use SchemaExport

NHibernate.Tool.hbm2ddl.SchemaExport: this thing is so cool its not even funny.

Consider the following code:

SchemaExport _schemaExport = new SchemaExport(new Configuration().Configure());
_schemaExport.Create(false,true);

Do you know what that does? It builds a database schema from the mapping files! The first parameter is an option to output the script to the console. The second parameter is an option to execute the script in the database.

All you need is an App.config such as:

<?xml version="1.0" encoding="utf-8" ?>
<configuration>

<configSections>
<section name="hibernate-configuration" type="NHibernate.Cfg.ConfigurationSectionHandler, NHibernate" />
</configSections>

<!--nhibernate configuration-->
<hibernate-configuration xmlns="urn:nhibernate-configuration-2.2">
<session-factory>
<!--the brand of dbms-->
<property name="dialect">NHibernate.Dialect.MsSql2005Dialect</property>
<!--connection provider-->
<property name="connection.provider">NHibernate.Connection.DriverConnectionProvider</property>
<!--sql client driver-->
<property name="connection.driver_class">NHibernate.Driver.SqlClientDriver</property>
<!--connection string-->
<property name="connection.connection_string">Server=localhost; initial catalog = YOURDBNAME; Integrated Security=SSPI</property>
<!--output generated sql to console window-->
<property name="hibernate.show_sql">true</property>
<!--the assembly with the embedded mapping files-->
<mapping assembly="[YOURASSEMMBLY without the .dll ectension]" />
</session-factory>
</hibernate-configuration>
</configuration>

If you don't have sql server 2005, change the dialect; NHibernate supports 99% of the popular modern databases. These are the steps you need:
  • Create an empty database using your DB server
  • Create NHibernate entities, create the mappings
  • Add the app.config listed above
  • Execute the code listed above
And there is your database. If you want to drop it at the end, simply call _schemaExport.Drop.

Do you know what this means? It means that my XML mapping IS my database schema.
You dont need to synchronize the database schema from the mappings if your database schema is CREATED FROM THE MAPPINGS!

If your mappings are the de-facto definition of your database, then you better have a very high degree of control over the database being built. Let's dissect a mapping file that is very conscious to database concerns:
<?xml version="1.0" encoding="utf-8" ?>
<hibernate-mapping xmlns="urn:nhibernate-mapping-2.2" default-lazy="true" default-cascade="all-delete-orphan">
<class name="acme.widget, acme" table="Widgets" >

Notice how you can set the default-lazy and default-cascade behavior of NHibernate on the class declaration? They define the general behavior of NHibernate in terms of loading objects and saving them, and changing these settings changes the behavior of your program considerably. Learn more about these settings here and here.

<!--primary key mapping-->
<id name="m_id" column="WidgetID" type="int" access="field" unsaved-value="0">
<generator class="identity" />
</id>
See the unsaved-value=0? This means any Widget object with an ID of 0 warrants an insert call when using the SaveOrUpdate method. Any Widget object with a nonzero ID warrants an update call. See the access="field"? This gives NHibernate the permission to manipulate the ID field through a private variable. This way, you don't need to expose the ID of an object as a public writable property.
<!--property-to-column mappings-->
<property name="WidgetName" not-null="true" type="string" length="30" unique="true" unique-key="myCustomUniqueKeyName" >
<column name="WidgetName" sql-type="varchar(30)" not-null="true" index="myCustomIndex" unique="true"/>
</property>
See how I add a <column> tag? the column tag gives me specifics about the sql data type of the column, the index, and allows me to set a named unique-constraint on the column (enforced by the database).
<!-- Many To One Relationships -->
<many-to-one name="Factory" column="FactoryID" not-null="true" not-found="exception" class="acme.Factory, acme" index="myIndex" foreign-key="myCustomFKName" />
</hibernate-mapping>
In here I explicitly named an index on this column called "myIndex". SchemaExport automatically builds the foreign key constraints for me, but I can give it my own name 'myCustomFKName'. I forced the property not to allow nulls (rarely do you need nulls) and to throw an exception if, for some reason, the relationship is broken. Redundant, perhaps, but there is nothing wrong with some redundancy to ensure the integrity of the model.

Using this kind of database-conscious mapping technique allows you to build a quality schema automatically from SchemaExport. I have challenged this SchemaExport with difficult schemas, single-table inheritance mapping, multi-table inheritance mapping, many-to-many mappings, and it never falls short.

Almost never. Here are a few examples where this is not enough:
  • Views, UDFs, stored procs and all of those other non-NHibernate related database objects
  • I want a clustered index on a foreign key, so all of the "child" rows are arranged physically adjacent with respect to the parent rows for faster disk access
Fortunately, SchemaExport gives us a great all-around solution for these kind of custom database concerns using the database-object tag...
<database-object>
<create>create view ....</create>
<drop>drop view....</drop>
</database-object>
You can also write a custom class to create the database object by implementing the NHibernate.Mapping.IAuxiliaryDatabaseObject class.

<database-object>
<definition class="MyViewDefinition, acme"/>
</database-object>


This also works for different dialect scopes using the dialect-scope parameter if you need to support different dbms's. all of this information can be found at the bottom of this page.

Bottom line is, the SchemaExport tool is a great shortcut to getting a database up and running from ntohing but the object code and some mappings. To upgrade the schema of existing databases without loss is still a tricky procedure, but this is great for things like:
  • Unit tests (destroy and rebuild a fresh data population with every test)
  • Early application development lifecycle (where the schema definition is in a high state of flux)
  • setup/deployment - creating a database schema within a setup deployment installation program is a snap with this tool

Labels: ,

Wednesday, November 21, 2007

Project Workflow

Lately, I've been working on multiple projects, and I've been able to see fundamental differences in workflow between companies.

Some companies will give you stacks of concise requirements and depend on you to submit detailed design documents for peer review, test the documents, write unit tests, and finally write code, and then write documents about the code (maybe even write documents about the documents!).

Some companies will give you a brief verbal description of what they need and expect you to turn it in to code in a matter of days, not weeks.

No doubt ehen you can afford the luxury of a slow, detailed and careful process, it's hard to argue against it. When you're in a hurry, you need to accept these risks and make things happen quickly.

So I got to thinking about this chart, which helps classify a project workflow. We have two axes, ceremony and frequency. Frequency refers to your rate of production, or, "how fast can you make it?", while ceremony refers to anything NOT related to writing production code (documentation, peer review, unit tests, branch labeling etc).

Normally, these are two opposing forces. I can write my production code much faster if I dont have to document or test it. I can unit test and document my work to no end, and get grey hair before the project is done.

Waterfall models (popular in big companies and academia) are in the lower right (A), while agile practices lean more to the upper left (B). The blue gradient represents a scatter on where most projects fall.

With a waterfall model it is very easy to stage out a plan and report incremental progress. With an agile model, it is very hard to set estimates and report progress.

Both of these factors are important, but they're a bit of a catch-22. Regardless of your personal philosophy, there are some automated tools that give you the best of both worlds in terms of speed and safety:
  • SandCastle help file builder GUI (compiles the nDoc-style comments in your code into .chm help files. There's your documentation!)
  • CruiseControl.NET (the concept of continuous integration implemented by ThoughtWorks, some of the best programmers on the planet)
  • Visual Studio class diagram tool - diagrams, directly connected to the code, the diagrams change as the code changes.
  • FogBugz - simple and no-nonsense task tracking tool managed by Joel Spolsky, one of my favorite writers
  • Subversion - Its true you can shell out $10,000 or more for a flashy source control system such as Clearcase or Team Foundation Server, but on a distilled functional level, this program matches their feature set and it costs $0. Thousands of sourceforge developers can't be wrong. Thousands of Google Code developers that use this as a standard source control option can't be wrong.

C# Short-Hand Tricks

Nothing earth-shattering, but this is a list of some of the lesser-known wrist-friendly shortcut features in the C# 3.0 compiler (and one that has been around for awhile).
These features are cool, because they all make the code smaller and easier to read, without adding perl-style cryptic confusion and making us a contender in any code obfuscation contest.

Coalesce
How often do we write something like this:

object outerObject = null;
...time goes by....
object innerScopedObject;
if (outerObject == null)
innerScopedObject = new object();
else
innerScopedObject = outerObject;

or

object outerObject = null;
...time goes by....
object innerScopedObject = outerObject ==null? new object();

with the coalesce operator (double question mark) '??' the following statement is equivalent:

object innerScopedObject = outerObject ?? new object():

Its nice, because the coalesce operator is specifically for handling null cases.

Automatic Properties
How often do we write something like this:

private int foo;
public int Foo
{
get { return foo; }
set { foo = value; }
}
The point is, you are encapsulating a private variable just to expose it via a property without any accessor logic. In C# 3.0, the following statement is equivalent:

public int Foo
{
get;
set;
}

The compiler will automatically generate the private variable for you. One less variable to worry about; less code is better code.

Nullable Primitives
(This trick has been around for awhile now) Ever needed to do this:

DateTime dt = null;

Just to find out it won't compile? Why is it they give us a database that allows for nullable integers and DateTimes, but a C# language that forbids it?
The 'nullable' operator (the question mark operator) '?' lets this work.

DateTime? dt = null;

That will compile. Don't get carried away with that trick; null can mean so many different things. Truth is, it is VERY rare when null values are handy in terms of database persistence.

Labels:

Friday, November 16, 2007

C# Generic conversion/grouping tricks

Note: none of this is new, it is just new to me.

I used to love C, for its speed and flexibility. I learned to love C# for its exception handling features. To me, the real killer feature of C# 2+ is the generics. It is undeniably easy to understand and makes my work profoundly easier.

One of the problems I have been having with generics is when I need a List<K> when I really need a List<T>. I stumbled upon the generic Converter class (finally) which makes life much easier for these scenarios.

This code builds a collection of floats and uses a delegate to convert it in to a collection of ints
        
public void Testx()
{
List<float> _floats = new List<float>();
_floats.Add(1f);
_floats.Add(11.5f);
_floats.Add(12f);
_floats.Add(1.1f);
_floats.Add(55f);
_floats.Add(124f);
_floats.Add(127.2f);
_floats.Add(6222f);

List<int> _ints = _floats.ConvertAll(ConvertFloatToInt);
}
public static int ConvertFloatToInt(float x)
{
return Convert.ToInt32(x);
}


Cool, but what if it is not a true one-to-one conversion between the source type and the target type? I mean, what if a bunch of Ks convert in to a common T? How about a grouping method?
This code takes in a collection of the source type, a converter, and returns a dictionary of the target type as keys, and collections of the source type as grouped values.

public static IDictionary<T, ICollection<K>> GroupBy<K,T >(ICollection<K> collection, Converter<K, T> converter)
{
Dictionary<T, ICollection<K>> _dictionary = new Dictionary<T, ICollection<K>>();
foreach (K k in collection)
{
T key = converter(k);
if (_dictionary.ContainsKey(key) == false)
{
_dictionary[key] = new List<K>();
}
_dictionary[key].Add(k);
}
return _dictionary;
}
public static int ConvertFloatToMod10Int(float x)
{
return Convert.ToInt32(x % 10);
}
[Test]
public void Testx()
{
List<float> _floats = new List<float>();
_floats.Add(1f);
_floats.Add(11.5f);
_floats.Add(12f);
_floats.Add(1.1f);
_floats.Add(55f);
_floats.Add(124f);
_floats.Add(127.2f);
_floats.Add(6222f);
Converter<float, int> _converter = new Converter<float, int>(ConvertFloatToMod10Int);
IDictionary<int, ICollection<float>> _dictionary = GroupBy(_floats, _converter);
}

Now this generic GroupBy method is going to be very handy in the future, especially since the delegate will dictate the behavior of the converting/grouping functionality.

Lets take this one step further: what if every K converts in to a collection of different Ts? Then we will need a "Fan Out" function (very handy for genetic algorithm processing, it turns out):








public static IDictionary<K, ICollection<T>> FanOut<K, T>(ICollection<K> collection, Converter<K, ICollection<T>> converter)
{
IDictionary<K, ICollection<T>> _dictionary = new Dictionary<K, ICollection<T>>();
foreach (K _k in collection)
{
ICollection<T> fan = converter(_k);
_dictionary.Add(_k,fan);
}
return _dictionary;
}

Labels:

Thursday, November 08, 2007

Architecture confusion with ORM

I read the most interesting post on this morning regarding ActiveRecord vs Objects written by Bob Martin. It was linked on InfoQ by Sadek Drobi. I personally think Bob was pretty far off the mark with his article, but I think I understand how to explain the errors in his conclusion.
First, some context. ActiveRecord is a design pattern by Martin Fowler which suggests how to correctly leverage ORM. Every object knows how to save, update, delete itself, etc. In the .NET realm, Castle ActiveRecord is a very compelling option to consider for persistence technology. As of the other week, it has been built on top of NHibernate 1.2, which means it is very fast, flexible, and supports generics. I'm currently playing around with ActiveRecord to build a sample app for an upcoming ACM lecture at UNH.

Anyways, I think I understand where Bob Martin is coming from, I once had a similar confusion, and I would like to clarify the incorrect points of this article. For starters, lets get to the heart of the problem:

The Active Record pattern is a way to map database rows to objects.

This statement is true. There is a dichotomy between the relational structures and the object oriented structures. Relational structures are stacks upon stacks of data, while objects can have intelligence and encapsulate the data. This is where the confusion begins:

From the beginning of OO we learned that the data in an object should be hidden, and the public interface should be methods.
In other words: objects export behavior, not data.
An object has hidden data and exposed behavior.

This is bizarre. Since when can't objects expose data AND methods?

In languages like C++ and C# the struct keyword is used to describe a data structure
with public fields.
If there are any methods, they are typically navigational.
They don’t contain business rules.

So, according to Bob, none of your data structures can have any business-logic intelligence.

Thus, data structures and objects are diametrically opposed.
They are virtual opposites.
One exposes behavior and hides data, the other exposes data and has no behavior.

OK now this sounds strange to me, but I think I understand where Bob was coming from when he came to this conclusion. In any data-oriented architecture you have the following inevitable concerns that need to be addressed:
  1. Domain/Entity definition (Bob would call this the definition of the data structures... you need to define the persistent objects)
  2. Data Access definition (Where do we actually implement "Save" Delete" etc...)
  3. Business Intelligence (Where to we define the procedures that manipulate these entities/domain objects?)
Lets try this architecture with some more tangible examples: it would be nice if we could simply separate each of these concerns into separate assemblies. Why? Separation is good, loose coupling makes your application more flexible, right?

In this UML diagram, the folders represent assemblies, and the arrows represent dependent references. Remember we cannot have two assemblies depend on each other, right?

The first assembly defines the persistent objects. The second assembly encapsulates the data access logistics. Since it depends on the object definition, it references the first assembly. The third assembly handles the business logic. It will need object definition and persistence functionality, so it depends on the other two assemblies.

Using this design, your objects truly become nothing but dumb data structures. This is what I think Bob was talking about. This should smell like an antipattern to any designer.

OK, we want our objects to have intelligence. Lets take a second swing. Imagine the business logic is inside of the persistent objects, but the data layer is in its own assembly.

Since the data layer depends on the object layer and vice versa, we will have to use some fancier tricks here. Consider defining the data layer interface inside of the entity layer, such that the entity layer can rely on an interface. Then, the data layer will implement the interfaces defined on the entity layer (hence the solid upward arrow). At runtime, the dependency on the data layer can be dynamically bound using dependency injection (hence the dotted arrow). The implementation specifics can be found here in Billy Mccafferty's article on NHibernate best practices.

Now you have objects and their intelligence in the same place, and the data persistence implementation is cleanly separated, so persistence-related concerns don't start to encroach in to your business logic. Still, this is not good enough for me. These layers look nice on paper, but in practice, there is a lot of work to keep the interfaces and the implementation correctly aligned. I think that unless your application is monolithic or database agnostic, the cost of this layering outweighs the benefits.

The third approach is more of a free-for-all. Everything is defined within the same assembly so there are no dependencies. It then becomes your responsibility as a programmer to correctly encapsulate and abstract away persistence-oriented specifics from your business logic. Are you up to the challenge? Perhaps this scenario will not work for everybody, but it seems to be a fair compromise between risk mitigation and simplicity to me.

Architecture can be confusing with ORM, since objects are closely related to their persistence concerns, but it CAN work! Don't decouple like a drunken sailor: ask yourself: Layers and abstraction are cool, but is this extra vestige really helping me more than it is hurting me?

Labels:

Monday, November 05, 2007

Completely unrelated- my first portrait


I'm hoping to give away paintings for my christmas presents this year. Canvas and paint are relatively cheap materials, and its rather obvious I could afford the practice ;).

This is my first attempt at a portrait, so what better subject to choose than my girlfriend, right? I havent heard any major complaints from her, so I'll consider it a keeper.

It can be very frustrating when you have a specific aura or feeling you are trying to portray, but sloppy brush skills and rough edges can get in the way.

I was attempting to evoke the qualities of a person that is strong willed, yet friendly and outgoing.