Skip to main content
Solved

extracting link from html xquery issue.

  • August 30, 2022
  • 5 replies
  • 46 views

johnglick
Contributor
Forum|alt.badge.img+7

I'm attempting to extract a link from html that looks something like this: <a class="email-link" href="https://app.myapp.com/results/a7696ced77454dd39d8241e79f981b3d">here</a>. using the xquery extractor. the x-query //a/@href seems to work in the online testers but not in FME. Any ideas?

Best answer by debbiatsafe

Hi @johnglick​ 

The rejection message on the rejected feature from XMLXQueryExtractor with the XQuery expression //a/@href is "...can not serialize attribute node".

 

Searching this error led to this StackOverflow answer. It seems the error is caused by serializing certain result types. Using either of the two functions mentioned in the answer, data() or string(), on the attribute does allow the XMLXQueryExtractor to successfully complete (eg. //a/data(@href) or //a/string(@href).

 

I'll note it is also possible to use the HTMLExtractor to extract URLs as an alternative to the XMLXQueryExtractor.

Use the XQuery expression //a/data(@href) in the XMLXQueryExtractor or CSS selector a[href] in HTMLExtractor

This post is closed to further activity.
It may be an old question, an answered question, an implemented idea, or a notification-only post.
Please check post dates before relying on any information in a question or answer.
For follow-up or related questions, please post a new question or idea.
If there is a genuine update to be made, please contact us and request that the post is reopened.

5 replies

hkingsbury
Celebrity
Forum|alt.badge.img+65
  • Celebrity
  • August 30, 2022

Another option would be to use the StringSearcher and regular expression

https://rubular.com/r/EmlvM3VidvKFl3

image


johnglick
Contributor
Forum|alt.badge.img+7
  • Author
  • Contributor
  • August 30, 2022

Another option would be to use the StringSearcher and regular expression

https://rubular.com/r/EmlvM3VidvKFl3

image

Certainly an acceptable work around. I'm curious as to why the xquery doesn't work, any ideas?


hkingsbury
Celebrity
Forum|alt.badge.img+65
  • Celebrity
  • August 30, 2022

Certainly an acceptable work around. I'm curious as to why the xquery doesn't work, any ideas?

I'll be honest, never used xquery before, so i'm not much help there!


debbiatsafe
Safer
Forum|alt.badge.img+21
  • Safer
  • Best Answer
  • August 31, 2022

Hi @johnglick​ 

The rejection message on the rejected feature from XMLXQueryExtractor with the XQuery expression //a/@href is "...can not serialize attribute node".

 

Searching this error led to this StackOverflow answer. It seems the error is caused by serializing certain result types. Using either of the two functions mentioned in the answer, data() or string(), on the attribute does allow the XMLXQueryExtractor to successfully complete (eg. //a/data(@href) or //a/string(@href).

 

I'll note it is also possible to use the HTMLExtractor to extract URLs as an alternative to the XMLXQueryExtractor.

Use the XQuery expression //a/data(@href) in the XMLXQueryExtractor or CSS selector a[href] in HTMLExtractor


johnglick
Contributor
Forum|alt.badge.img+7
  • Author
  • Contributor
  • September 1, 2022

Hi @johnglick​ 

The rejection message on the rejected feature from XMLXQueryExtractor with the XQuery expression //a/@href is "...can not serialize attribute node".

 

Searching this error led to this StackOverflow answer. It seems the error is caused by serializing certain result types. Using either of the two functions mentioned in the answer, data() or string(), on the attribute does allow the XMLXQueryExtractor to successfully complete (eg. //a/data(@href) or //a/string(@href).

 

I'll note it is also possible to use the HTMLExtractor to extract URLs as an alternative to the XMLXQueryExtractor.

Use the XQuery expression //a/data(@href) in the XMLXQueryExtractor or CSS selector a[href] in HTMLExtractor

Thanks! I never realized it was possible to put a parameter other than "Whole" or "Part" in the Tag Part/HTML Attribute area of the HTML extractor. Thanks to your input I can now eliminate a transformer from my workflow!