Solved

extracting link from html xquery issue.

  • 30 August 2022
  • 5 replies
  • 2 views

Badge

I'm attempting to extract a link from html that looks something like this: <a class="email-link" href="https://app.myapp.com/results/a7696ced77454dd39d8241e79f981b3d">here</a>. using the xquery extractor. the x-query //a/@href seems to work in the online testers but not in FME. Any ideas?

icon

Best answer by debbiatsafe 31 August 2022, 02:17

View original

5 replies

Userlevel 4
Badge +29

Another option would be to use the StringSearcher and regular expression

https://rubular.com/r/EmlvM3VidvKFl3

image

Badge

Another option would be to use the StringSearcher and regular expression

https://rubular.com/r/EmlvM3VidvKFl3

image

Certainly an acceptable work around. I'm curious as to why the xquery doesn't work, any ideas?

Userlevel 4
Badge +29

Certainly an acceptable work around. I'm curious as to why the xquery doesn't work, any ideas?

I'll be honest, never used xquery before, so i'm not much help there!

Userlevel 2
Badge +17

Hi @johnglick​ 

The rejection message on the rejected feature from XMLXQueryExtractor with the XQuery expression //a/@href is "...can not serialize attribute node".

 

Searching this error led to this StackOverflow answer. It seems the error is caused by serializing certain result types. Using either of the two functions mentioned in the answer, data() or string(), on the attribute does allow the XMLXQueryExtractor to successfully complete (eg. //a/data(@href) or //a/string(@href).

 

I'll note it is also possible to use the HTMLExtractor to extract URLs as an alternative to the XMLXQueryExtractor.

Use the XQuery expression //a/data(@href) in the XMLXQueryExtractor or CSS selector a[href] in HTMLExtractor

Badge

Hi @johnglick​ 

The rejection message on the rejected feature from XMLXQueryExtractor with the XQuery expression //a/@href is "...can not serialize attribute node".

 

Searching this error led to this StackOverflow answer. It seems the error is caused by serializing certain result types. Using either of the two functions mentioned in the answer, data() or string(), on the attribute does allow the XMLXQueryExtractor to successfully complete (eg. //a/data(@href) or //a/string(@href).

 

I'll note it is also possible to use the HTMLExtractor to extract URLs as an alternative to the XMLXQueryExtractor.

Use the XQuery expression //a/data(@href) in the XMLXQueryExtractor or CSS selector a[href] in HTMLExtractor

Thanks! I never realized it was possible to put a parameter other than "Whole" or "Part" in the Tag Part/HTML Attribute area of the HTML extractor. Thanks to your input I can now eliminate a transformer from my workflow!

Reply