With this TSQL script, we can remove HTML content as long as
there are valid start and ending HTML tags.
Script:
Create FUNCTION [dbo].[udf_StripHTMLContent]
(@content VARCHAR(MAX),@tagStart varchar(55), @tagEnd varchar(55))
RETURNS VARCHAR(MAX)
AS
BEGIN
DECLARE @start INT=1
DECLARE @end INT=1
DECLARE @len INT=1
WHILE @start > 0 AND @end > 0 AND @len > 0
BEGIN
SET @start = CHARINDEX(@tagStart,@content)
SET @end = (len(@tagEnd)-1) + CHARINDEX(@tagEnd,@content,CHARINDEX(@tagStart,@content))
SET @len = (@end - @start) + 1
IF @start > 0 AND @end > 0 AND @len > 0 and (@len > 1 and @end > len(@tagEnd))
BEGIN
SET @content = STUFF(@content,@start,@len,'')
END
ELSE
BEGIN
break;
END
END
RETURN REPLACE(@content,' ',' ')
This script parses the content looking for the start and end
tags. It replaces all the content that is contained by those tags. It continues
to parse the string until no more matching tags are found or the end of the
string is reached.
Example:
Use the following examples to see how this function can be
used. The first couple of examples just parse a string. The last example does
an inline SQL update to remove HTML tags from some inventory records.
declare @html varchar(max), @start varchar(55), @end varchar(55)
set @start = '<div><a
href="bad.com">'
set @end = '</a></div>'
set @html = 'This item has many
features<div><a href="bad.com">Unwanted
content</a></div>'
select @html as before
set @html =dbo.udf_StripHTMLContent(@html,@start, @end)
select @html as after
set @start = '<p
style="display:none">'
set @end = '</p>'
set @html = 'This item has many
features<p style="display:none"><a
href="bad.com">Unwanted content</a></p>'
select @html as before
set @html =dbo.udf_StripHTMLContent(@html,@start, @end)
select @html as after
--TABLE UPDATE
CALL
update a set ItemDetail = dbo.udf_StripHTMLContent(ItemDetail,@start, @end)
from Inventory..Item a
...
This function can also be used to strip out content that
matches a particular start and ending pattern.
I hope this is useful.