Sunday, June 01, 2008

Stored procedures are not always the best tool for selecting

Working as a consultant in the past few years has given me an opportunity to view a diverse set of code bases, and see come of the common trends and habits of developers all over the map.

Literally every single codebase including (some frameworks) I have seen in the past few years bear a common design that bugs me:

READ-ONLY SQL STATEMENTS DONT NECESSARILY BELONG IN STORED PROCEDURES, THATS WHAT FUNCTIONS ARE FOR.

If you dont know what a database function is, just think of it as a parametric view. Now let me back this up some facts, and a code example that I feel justifies this argument once and for all.
  1. There is NO performance difference between a user-defined function and a stored procedure. None at all. Since the introduction of UDFs in SQL Server 2000, functions and stored procedures leverage the same compiled/cached query optimizations. In sql 2005, both stored procedures and functions can be written in the CLR environment. Since a function is guaranteed to be read-only and deterministic, in some cases a function will have a more optimal query plan than the equivalent stored procedure.
  2. A function GUARANTEES there are no side effects to the data. Since UDFs are parametric views, they are by definition deterministic and read-only opterations, so when you call them you never have to worry about any surprise side effects to your data (stored procedures cannot guarantee this).
  3. Stored procedures are not as modular and flexible as functions. In many cases, you may need to continue ETL operations on the result set from a compiled query. Every wished you could select or join from a stored procedure? Consider this scenario: I need a northwind query to retrieve employees with a HiredDate between a given date range;
    Stored procedure:
    CREATE PROCEDURE sp_GetEmployeesHiredWithindateRange
    @startDate datetime,
    @endDate datetime
    AS
    BEGIN
    select * from employees where hireDate between @startDate and @endDate
    END
    Function equivalent:
    CREATE FUNCTION fn_GetEmployeesHiredWithindateRange
    (
    @startDate datetime,
    @endDate datetime
    )
    RETURNS TABLE
    AS
    RETURN
    (
    select * from employees where hireDate between @startDate and @endDate
    )
    I call them like this:
    select * from fn_GetEmployeesHiredWithindateRange('5/25/1990','5/25/2005')
    exec sp_GetEmployeesHiredWithindateRange '5/25/1990','5/25/2005'
    and they will return the same result set. Now, imagine I have a special case where I want to combine this query with whom each employee in the original results reports to.
    With a function, I just modify the query such as:
    select * from fn_GetEmployeesHiredWithindateRange('5/25/1990','5/25/2005') e1
    inner join employees e2 on e2.employeeid = e1.reportsto
    and I am done. If I chose the stored procedure route, there is no flexible way to do this, I would need to write another stored procedure just for this case. Because you can select from the result set of functions, you can do some GREAT code reuse design on the database side that would have been a nightmare in stored procedures.
  4. Stored procedures ARE NOT the only way to protect against sql injection. I see quite a few people defending their choice of stored procedures because the rigid syntax of a sproc call protects the system against query injections. There are easier ways! Let me show you an example of how to write a UDF call that has the same level of protection:
           string sqlQuery = "select * from fn_GetEmployeesHiredWithindateRange(@startDate,@enddate)";
    DataTable dataTable = new DataTable();
    SqlCommand sqlCommand = new SqlCommand(query, sqlConnection);
    sqlCommand.CommandType = CommandType.Text;
    sqlCommand.CommandTimeout = 300;
    sqlCommand.Parameters.AddWithValue("@startDate",new DateTime(1990,5,25));
    sqlCommand.Parameters.AddWithValue("@endDate",new DateTime(2005,5,25));
    SqlDataAdapter sqlDataAdapter = new SqlDataAdapter(sqlCommand);
    sqlDataAdapter.Fill(dataTable);
    return dataTable;
    I am writing ad-hoc sql inside of my C#, and parameterizing it out like a stored procedure. You know what that ends up looking like on the sql profiler?
    exec sp_executesql N'select * from fn_GetEmployeesHiredWithindateRange(@startDate,@endDate)',N'@startDate datetime, @endDate datetime',@startDate='5/25/1990',@endDate='5/25/2005'
    sp_executesql is a call from ADO.NET for running generated queries in a controlled stored procedure fashion. Subsequent calls of a similar structure enjoy the speed boost of a compiled query. This is the magic behind NHibernate's query-generation logic. In terms of SQL injection, this is every bit as safe as a stored procedure call, but with more flexibility.
To sum this all up:
  • Stored procedures are the right choice for compiled database functions with side effects (eg: inserts and updates)
  • Functions are the right choice for compiled parametric queries (eg selects)
  • When you think of functions, think of them as views with arguments.

Some other notes:
  • You cannot "order by" inside of a function, but you can in a stored procedure. You cannot order the result set from a stored procedure, but you can order the result set from a function.
  • Stored procedures have output parameters, functions don't.
  • Functions can return tables or single values, for example:
    Here is the calling syntax for a single-valued output function:
    select fn_getStartDateForEmployee(employeeID) as startDate from employees
    and here is the calling syntax for a table-valued output function:
    select * from fn_GetEmployeesHiredWithindateRange('1/1/2000','1/1/2005')

Labels:

0 Comments:

Post a Comment

<< Home