Monday, October 22, 2007

SQL String Filters

I had this recent chore with reporting that had me thinking about cool ways to filter substring patterns from text.

The following is a common snippet of udf code that removes, or filters out non-alphanumeric characters from a sql field:

CREATE FUNCTION dbo.FilterNonAlphaNumericCharacters
(
@input VARCHAR(8000)
)
RETURNS VARCHAR(8000)
AS
BEGIN
DECLARE @i SMALLINT
SET @i = PATINDEX('%[^0-9A-Za-z ]%', @input)
WHILE @i > 0
BEGIN
SET @input = STUFF(@input, @i, 1, '')
SET @i = PATINDEX('%[^0-9A-Za-z]%', @input)
END
RETURN @input
END
GO
It would be executed like:
select dbo.FilterNonAlphaNumericCharacters(PubSectionName) from pubsections
Here's another one: Sometimes a sql value will have more than one space between words. Heres a udf for removing multiple spaces (again, very common across the internet)
CREATE FUNCTION dbo.FilterMultipleSpaces
(
@input varchar(8000)
)
RETURNS varchar(8000)
AS
BEGIN
WHILE CHARINDEX(' ', @input) > 0
BEGIN
SET @input = REPLACE(@input, ' ', ' ')
END
RETURN @input
END
GO
It would be executed like:
select dbo.FilterMultipleSpaces(PubSectionName) from pubsections
What are the common elements from these functions? We have a target pattern that we are trying to filter out of a string. Instead writing specialized filter after filter, I decided to write something a little more universal.

CREATE FUNCTION dbo.MyFilter
(
@input VARCHAR(8000),
@filterPattern VARCHAR(100),
@replacementVal varchar(100)
)
RETURNS VARCHAR(8000)
AS
BEGIN
DECLARE @i SMALLINT
declare @paddingval int

if( len(@replacementVal) = 0)
set @paddingVal = 1
else
set @paddingVal = len(@replacementVal)

SET @i = PATINDEX(@filterPattern, @input)
WHILE @i > 0
BEGIN
SET @input = STUFF(@input, @i, @paddingVal, @replacementVal)
SET @i = PATINDEX(@filterPattern, @input)
END
RETURN @input
END
GO

Now, to filter out non-alphanumeric characters, the following two statements are equivalent

select dbo.MyFilter(PubSectionName,'%[^0-9A-Za-z ]%','') from pubsections
select dbo.FilterNonAlphaNumericCharacters(PubSectionName) from pubsections

and these two statements are also equivalent for removing multiple spaces

select dbo.MyFilter(PubSectionName,'% %',' ') from pubsections
select dbo.FilterMultipleSpaces(PubSectionName) from pubsections

I think its cool, because it will replace any pattern with a specified string such as

select PubSectionName from pubsections
select dbo.MyFilter2(PubSectionName,'% %','-=-') from pubsections

Still the code above is getting more complex and inefficient I have hesitations about extending this functionality to something too generic.

Labels:

0 Comments:

Post a Comment

<< Home