Skip to main content
Solved

xml formatting or encoding


Forum|alt.badge.img

Hi All, i am reading data from a webservice (creator - xml templater - httpCaller) and the data i receive back is not in the format i expected..... Do i need to play with encoding or something else here ?

i have truncated the xml to just a small section included below.

i receive:

  1. <DocumentElement>
  2. <Results_x0020_for_x0020__x0027_Index_x003D_survey_x0026_StreetName_x003D_KASHMIR_x0020_RD_x0026_StreetNo_x003D_31_x0027_>
  3. <Global_x0020_Council_x0020_ID>20060926T154516_3_Rodney</Global_x0020_Council_x0020_ID>
  4. <Asset_x0020_No.>36848</Asset_x0020_No.>
  5. <Contract_x0020_No.>3853</Contract_x0020_No.>
  6. <Contractor>Project Max</Contractor>
  7. <Date_x0020_Inspected>26/09/2006</Date_x0020_Inspected>
  8. <Completion_x0020_Status>IC</Completion_x0020_Status>

i was expecting more like this below , ie without the "_x0020_" etc values

  1. DocumentElement>
  2. <Resultsfor'Index=survey&StreetName;=KASHMIR RD&StreetNo;=31’>
  3. <Global Council ID>20060926T154516_3_Rodney</Global Council ID>
  4. <Asset No.>36848</Asset No.>
  5. <Contract No.>3853</Contract No.>
  6. <Contractor>Project Max</Contractor>
  7. <Date Inspected>26/09/2006</Date Inspected>
  8. <Completion Status>IC</Completion Status>

Any hints or suggestions on how to remedy this ?

Thanks Steve

Best answer by jdh

Try a StringReplacer

Text to Match: _x00([0-9a-z]*)_

Replacement Text: %\\1

Followed by a TextDecoder with the Encoding Type set to URL (Percent Encoding)

View original
Did this help you find an answer to your question?

8 replies

takashi
Contributor
Forum|alt.badge.img+19
  • Contributor
  • May 25, 2016

Hi @goatboy, how about using the StringPairReplacer?

Replacement Pairs: _x0020_ " " _x0026_ & _x0027_ ' _x003D_ =

Note: The resulting text will not be a valid XML document any longer. i.e. XML transformers cannot be used to parse that.


Forum|alt.badge.img
  • Author
  • May 25, 2016
takashi wrote:

Hi @goatboy, how about using the StringPairReplacer?

Replacement Pairs: _x0020_ " " _x0026_ & _x0027_ ' _x003D_ =

Note: The resulting text will not be a valid XML document any longer. i.e. XML transformers cannot be used to parse that.

Thanks @takashi  . Any idea why the feed would be coming thru with these codes? I am hoping to avoid replacing strings. As you mentioned, i am still keen to use the XML tools to parse the data later in the workbench. 


jdh
Contributor
Forum|alt.badge.img+28
  • Contributor
  • May 25, 2016
goatboy wrote:

Thanks @takashi . Any idea why the feed would be coming thru with these codes? I am hoping to avoid replacing strings. As you mentioned, i am still keen to use the XML tools to parse the data later in the workbench.

That looks like a variation of hex encoding.

Does your original data come from sharepoint?

 


Forum|alt.badge.img
  • Author
  • May 25, 2016
jdh wrote:

That looks like a variation of hex encoding.

Does your original data come from sharepoint?

 

it comes from a amazonaws service i believe

http://ec2-54-252-37-255.ap-southeast-2.compute.amazonaws.com/ImportService.asmx

steve


jdh
Contributor
Forum|alt.badge.img+28
  • Contributor
  • Best Answer
  • May 25, 2016

Try a StringReplacer

Text to Match: _x00([0-9a-z]*)_

Replacement Text: %\\1

Followed by a TextDecoder with the Encoding Type set to URL (Percent Encoding)


Forum|alt.badge.img
  • Author
  • May 25, 2016
jdh wrote:

Try a StringReplacer

Text to Match: _x00([0-9a-z]*)_

Replacement Text: %\\1

Followed by a TextDecoder with the Encoding Type set to URL (Percent Encoding)

Many Thanks JDH, i will give that a try and report back.

Steve


Forum|alt.badge.img
  • Author
  • May 25, 2016
jdh wrote:

Try a StringReplacer

Text to Match: _x00([0-9a-z]*)_

Replacement Text: %\\1

Followed by a TextDecoder with the Encoding Type set to URL (Percent Encoding)

Many Thanks JDH, that seemed to work. i still am wondering why it came thru like that but thats a question for another day.....

Thanks Again

Steve


jdh
Contributor
Forum|alt.badge.img+28
  • Contributor
  • May 25, 2016
goatboy wrote:

Many Thanks JDH, i will give that a try and report back.

Steve

I actually changed my mind.

 

 

I prefer

 

_x([0-9a-z]*)_

 

U+\\1

 

With the textDecoder set to Unicode Code Point

 

 

It covers the non latin characters better.

Cookie policy

We use cookies to enhance and personalize your experience. If you accept you agree to our full cookie policy. Learn more about our cookies.

 
Cookie settings