Monday, 10 October 2016

T-SQL: Casting Varchar to XML removing illegal characters

As I mentioned in an earlier post, we deal a lot with XML messages on ActiveMQ JMS queues going through our ESB.  For audit purposes most data that goes through the ESB is logged at some point, including these XML messages.

These XML messages have been stored, historically, in fields of type varchar, so if we want to use XPath to query the XML, we need to cast the varchar values to XML first.

Unfortunately, there are illegal characters in our XML, so we can't just CAST(... AS XML), we have to replace those characters first.

As I am a sharing kind of guy, here's a snippet to help others...



Declare @cr Char(1) = char(13)
Declare @lf Char(1) = char(10)
Declare @tab Char(1) = char(9)
Declare @pound Char(1) = char(163)
Declare @uScore Char(1) = char(150)
Declare @regSymbol Char(1) = char(174)
Declare @tmSymbol Char(1) = char(153)
Declare @f1Quote Char(1) = char(146)
Declare @b1Quote Char(1) = char(145)

SELECT
CAST(
   REPLACE(
      REPLACE(
         REPLACE(
            REPLACE(
               REPLACE(
                  REPLACE(
                     REPLACE(
                        REPLACE(
                           REPLACE([MyXMLField], @cr, ''),
                        @lf, ''),
                     @tab, ''),
                  @pound, ''),
               @uScore, ''),
            @regSymbol,''),
         @tmSymbol,''),
      @f1Quote,''),
   @b1Quote,'') AS XML)
FROM ...

Please let me know in the comments if you have come across a cleaner solution to this problem, or if it needs updating with extra characters.

No comments:

Post a Comment