Monday, July 10, 2006

Effective Database Version Control Techniques

Effective Database Versioning Methodology


The challenge

One of the inevitable setbacks of a growing development team is the overhead cost of versioning maintenance. Every professional development experience I've handled involves a realm of ever changing business requirements, hence, schemas in a relative state of flux. One of the biggest frustrations of an enthusiastic developer is the constant need of documentation, regression testing, and version management within this flux to maintain stability. As evolving applications change build versions, there are many "application hosts" that need to be synchronized, inevitably leading to a large amount of unnecessary bugs, and a considerable percentage of time dedicated solely to the synchronization of versions of the "latest build" among various test/staging/and production platforms.

Your options

In the past, code versioning applications such as CVS and SourceSafe have proven effective (and widely known) solutions to this challenge. For us in the data-driven realm, the problem with version control of multiple databases remains. How do we seamlessly upgrade the schema/data of our production database with minimal or no downtime? How do multiple developers with individual copies of a database "get the latest" version of the schema and data? At this point I will mention that intelligently designed database "diff" tools exist such as RedGate SQL Compare to offer partial but conditional (and expensive) solutions for those less inclined to get their hands dirty and have full control. As elegant as these tools may be, trusting crucial schema changes to an automated third party program can be a perilous decision.

A proposed solution

As each database presents its own unique challenge, I don't offer you a one-size-fits-all downloadable application. I offer a time-tested and provably effective methodology that can be adapted to the workflow of your development cycle, provided you're ready to get your hands a little dirty with sql scripting code outside of the comfy GUI most developers use.

What you will need:



  • An existing source control system


    • You can use the same repository as your code base, or simply a single text file on an accessible network share as a last resort.


  • Database permissions to run powerful scripts


    • (developers will need something close to DBOwner access on their relevant databases)


  • A competent knowledge of SQL.


    • You will need to know how to write script to update the schema and data of your specific brand of database.



Known limitations to this methodology:



  • There are some data types that can't be manipulated through the use of scripts. These include:


    • Binary (image, bitmap, blob)

    • Variable-length text (Text-16)


  • For seamless transitions, version changes should be "downgrade-compatible".


    • This means that the "older build" of your code should be able to run normally on your upgraded database, for example, the addition of new columns and tables. This is not always possible (a column or table is removed, or a procedure changes method signatures), but careful planning reduces the frequency of these issues.


  • Data changes that rely on temporal data (consistent datetimes, automatically generated IDs) can't be guaranteed consistent throughout multiple databases.


    • If you are relying on this kind of consistency, you might have design flaws in your business logic to begin with!



The Strategy

The Versioning Table









For the sake of flexibility and verbosity, declare a table within the database (we'll call it DBVersion) serving as a version indicator. Each row should be able to store the of the version ID, a brief description, when the change happened, who initiated the change, and any extra information you may find convenient.

This table will take the guesswork out of determining the last update to the database.

schema


The Script

In conjunction with the table, we need a script that performs actions driven by the values in the table. The following is the script algorithm:


Start transaction [ALL]

For each version change I … N Do:

If(not exists (select (1) from DBVersion where DBVersionID=I))

Begin

Start transaction [i]

<<perform update to version i>>

If failure, rollback and exit

Else Insert into DBVersion values(I,<>,getDate(),User_ID())

Commit transaction [i]

End

Next [i]

Commit transaction [ALL]


Now observe this closely. This script is designed to update only as much as it needs to, without performing the same update twice. If the database already has record of a certain version change, the script will simply move to the next change. The script will run on databases under different versions with a consistent outcome, and a clear audit trail of how and when the changes occurred. If, for some reason, the script fails, the nested transaction structure enforces the database to roll back to its state before execution.

If a script if built properly in this manner, synchronizing the database can be as easy as "getting the latest" version of the code, and running the script before testing.

If there is one fundamental rule to the script, it is NEVER DELETE FROM THE SCRIPT. Occasionally, it seems tempting to delete past mistakes in the script, but this ruins the serial record inside DBVersioning table! Instead of deleting, make an additional version change that corrects previous mistakes. After all, most scripts will (hopefully) not require a critically long execution time.

Even though its not necessary that the DBVersioning methodology is implemented from the inception of the database, there are clear advantages of having a build script from the ground up.


Real-Life Example Using SQL Server

Every major project begins with a single step. Getting into the habit of proper versioning is the hardest part, so I will supply some code for our beloved SQL Server Northwind database as a starting example.


--Enforce this is Northwind

USE Northwind

--To ensure ACID, begin the main transaction

BEGIN TRANSACTION main


--step one, whether this is a brand new database, or a mature database with a new versioning method,

--make sure the DBVersion table exists

if(not exists (select * from dbo.sysobjects where id = object_id(N'[dbo].[DBVersion]') and OBJECTPROPERTY(id, N'IsUserTable') = 1))

begin

print N'Building the foundation DBVersion Table'

BEGIN TRANSACTION one

--build the table

CREATE TABLE DBVersion (

[DBVersionID] int NOT NULL ,

[Description] varchar (50) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL ,

[ExecutionDate] datetime NOT NULL ,

[UserID] int NOT NULL

)

--any potential errors get reported, and the script is rolled back and terminated

if(@@error <> 0)

begin

ROLLBACK TRANSACTION

RETURN

end

--insert the first version marker

INSERT INTO DBVersion values (1,'Build the DBVersion Table',getDate(),User_ID())

--any potential errors get reported, and the script is rolled back and terminated

if(@@error <> 0)

begin

ROLLBACK TRANSACTION

RETURN

end

COMMIT TRANSACTION one

end

--Continuing, adding to this script is relatively simple! Observe...

if(not(exists(select * from DBVersion where DBVersionID = 2)))

begin

print N'Adding a ''DateCreated'' field onto the Customers table for auditing purposes'

BEGIN TRANSACTION two

--add the column, or whatever else you may need to do

ALTER TABLE Customers add DateCreated DateTime not null default getDate()

--any potential errors get reported, and the script is rolled back and terminated

if(@@error <> 0)

begin

ROLLBACK TRANSACTION

RETURN

end

--insert the second version marker

INSERT INTO DBVersion values

(2,'Adding the DateCreated field to customers',getDate(),User_ID())

--any potential errors get reported, and the script is rolled back and terminated

if(@@error <> 0)

begin

ROLLBACK TRANSACTION

RETURN

end

COMMIT TRANSACTION two

end

--from here on, continually add steps in the previous manner as needed.

COMMIT TRANSACTION main


Note that as the above code becomes more verbose, these scripts can reach a daunting length in a short amount of time, so it may be to your benefit to store multiple "chapters" of the conceptual evolution of your database.

Since each project presents its own unique challenges, be sure to help the development community and share your experiences and additions to this concept. Thanks for reading and best of wishes!

-Pete