Skip to main content
Solved

extracting link from html xquery issue.


johnglick
Contributor
Forum|alt.badge.img+1

I'm attempting to extract a link from html that looks something like this: <a class="email-link" href="https://app.myapp.com/results/a7696ced77454dd39d8241e79f981b3d">here</a>. using the xquery extractor. the x-query //a/@href seems to work in the online testers but not in FME. Any ideas?

Best answer by debbiatsafe

Hi @johnglick​ 

The rejection message on the rejected feature from XMLXQueryExtractor with the XQuery expression //a/@href is "...can not serialize attribute node".

 

Searching this error led to this StackOverflow answer. It seems the error is caused by serializing certain result types. Using either of the two functions mentioned in the answer, data() or string(), on the attribute does allow the XMLXQueryExtractor to successfully complete (eg. //a/data(@href) or //a/string(@href).

 

I'll note it is also possible to use the HTMLExtractor to extract URLs as an alternative to the XMLXQueryExtractor.

Use the XQuery expression //a/data(@href) in the XMLXQueryExtractor or CSS selector a[href] in HTMLExtractor

View original
Did this help you find an answer to your question?

5 replies

hkingsbury
Celebrity
Forum|alt.badge.img+53
  • Celebrity
  • August 30, 2022

Another option would be to use the StringSearcher and regular expression

https://rubular.com/r/EmlvM3VidvKFl3

image


johnglick
Contributor
Forum|alt.badge.img+1
  • Author
  • Contributor
  • August 30, 2022
hkingsbury wrote:

Another option would be to use the StringSearcher and regular expression

https://rubular.com/r/EmlvM3VidvKFl3

image

Certainly an acceptable work around. I'm curious as to why the xquery doesn't work, any ideas?


hkingsbury
Celebrity
Forum|alt.badge.img+53
  • Celebrity
  • August 30, 2022
johnglick wrote:

Certainly an acceptable work around. I'm curious as to why the xquery doesn't work, any ideas?

I'll be honest, never used xquery before, so i'm not much help there!


debbiatsafe
Safer
Forum|alt.badge.img+20
  • Safer
  • Best Answer
  • August 31, 2022

Hi @johnglick​ 

The rejection message on the rejected feature from XMLXQueryExtractor with the XQuery expression //a/@href is "...can not serialize attribute node".

 

Searching this error led to this StackOverflow answer. It seems the error is caused by serializing certain result types. Using either of the two functions mentioned in the answer, data() or string(), on the attribute does allow the XMLXQueryExtractor to successfully complete (eg. //a/data(@href) or //a/string(@href).

 

I'll note it is also possible to use the HTMLExtractor to extract URLs as an alternative to the XMLXQueryExtractor.

Use the XQuery expression //a/data(@href) in the XMLXQueryExtractor or CSS selector a[href] in HTMLExtractor


johnglick
Contributor
Forum|alt.badge.img+1
  • Author
  • Contributor
  • September 1, 2022
debbiatsafe wrote:

Hi @johnglick​ 

The rejection message on the rejected feature from XMLXQueryExtractor with the XQuery expression //a/@href is "...can not serialize attribute node".

 

Searching this error led to this StackOverflow answer. It seems the error is caused by serializing certain result types. Using either of the two functions mentioned in the answer, data() or string(), on the attribute does allow the XMLXQueryExtractor to successfully complete (eg. //a/data(@href) or //a/string(@href).

 

I'll note it is also possible to use the HTMLExtractor to extract URLs as an alternative to the XMLXQueryExtractor.

Use the XQuery expression //a/data(@href) in the XMLXQueryExtractor or CSS selector a[href] in HTMLExtractor

Thanks! I never realized it was possible to put a parameter other than "Whole" or "Part" in the Tag Part/HTML Attribute area of the HTML extractor. Thanks to your input I can now eliminate a transformer from my workflow!


Cookie policy

We use cookies to enhance and personalize your experience. If you accept you agree to our full cookie policy. Learn more about our cookies.

 
Cookie settings