SQL String Filters
I had this recent chore with reporting that had me thinking about cool ways to filter substring patterns from text.
The following is a common snippet of udf code that removes, or filters out non-alphanumeric characters from a sql field:
Now, to filter out non-alphanumeric characters, the following two statements are equivalent
and these two statements are also equivalent for removing multiple spaces
I think its cool, because it will replace any pattern with a specified string such as
Still the code above is getting more complex and inefficient I have hesitations about extending this functionality to something too generic.
The following is a common snippet of udf code that removes, or filters out non-alphanumeric characters from a sql field:
It would be executed like:
CREATE FUNCTION dbo.FilterNonAlphaNumericCharacters
(
@input VARCHAR(8000)
)
RETURNS VARCHAR(8000)
AS
BEGIN
DECLARE @i SMALLINT
SET @i = PATINDEX('%[^0-9A-Za-z ]%', @input)
WHILE @i > 0
BEGIN
SET @input = STUFF(@input, @i, 1, '')
SET @i = PATINDEX('%[^0-9A-Za-z]%', @input)
END
RETURN @input
END
GO
select dbo.FilterNonAlphaNumericCharacters(PubSectionName) from pubsectionsHere's another one: Sometimes a sql value will have more than one space between words. Heres a udf for removing multiple spaces (again, very common across the internet)
CREATE FUNCTION dbo.FilterMultipleSpacesIt would be executed like:
(
@input varchar(8000)
)
RETURNS varchar(8000)
AS
BEGIN
WHILE CHARINDEX(' ', @input) > 0
BEGIN
SET @input = REPLACE(@input, ' ', ' ')
END
RETURN @input
END
GO
select dbo.FilterMultipleSpaces(PubSectionName) from pubsectionsWhat are the common elements from these functions? We have a target pattern that we are trying to filter out of a string. Instead writing specialized filter after filter, I decided to write something a little more universal.
CREATE FUNCTION dbo.MyFilter
(
@input VARCHAR(8000),
@filterPattern VARCHAR(100),
@replacementVal varchar(100)
)
RETURNS VARCHAR(8000)
AS
BEGIN
DECLARE @i SMALLINT
declare @paddingval int
if( len(@replacementVal) = 0)
set @paddingVal = 1
else
set @paddingVal = len(@replacementVal)
SET @i = PATINDEX(@filterPattern, @input)
WHILE @i > 0
BEGIN
SET @input = STUFF(@input, @i, @paddingVal, @replacementVal)
SET @i = PATINDEX(@filterPattern, @input)
END
RETURN @input
END
GO
Now, to filter out non-alphanumeric characters, the following two statements are equivalent
select dbo.MyFilter(PubSectionName,'%[^0-9A-Za-z ]%','') from pubsections
select dbo.FilterNonAlphaNumericCharacters(PubSectionName) from pubsections
and these two statements are also equivalent for removing multiple spaces
select dbo.MyFilter(PubSectionName,'% %',' ') from pubsections
select dbo.FilterMultipleSpaces(PubSectionName) from pubsections
I think its cool, because it will replace any pattern with a specified string such as
select PubSectionName from pubsections
select dbo.MyFilter2(PubSectionName,'% %','-=-') from pubsections
Still the code above is getting more complex and inefficient I have hesitations about extending this functionality to something too generic.
Labels: sql


0 Comments:
Post a Comment
<< Home