I'm attempting to extract a link from html that looks something like this: <a class="email-link" href="https://app.myapp.com/results/a7696ced77454dd39d8241e79f981b3d">here</a>. using the xquery extractor. the x-query //a/@href seems to work in the online testers but not in FME. Any ideas?

extracting link from html xquery issue.

Userlevel 5

+29

hkingsbury
Celebrity
1109 replies
1 year ago
30 August 2022

Another option would be to use the StringSearcher and regular expression

https://rubular.com/r/EmlvM3VidvKFl3

J

Another option would be to use the StringSearcher and regular expression

https://rubular.com/r/EmlvM3VidvKFl3

Certainly an acceptable work around. I'm curious as to why the xquery doesn't work, any ideas?

Userlevel 5

+29

hkingsbury
Celebrity
1109 replies
1 year ago
30 August 2022

Certainly an acceptable work around. I'm curious as to why the xquery doesn't work, any ideas?

I'll be honest, never used xquery before, so i'm not much help there!

Userlevel 3

+17

debbiatsafe
Safer
587 replies
1 year ago
31 August 2022
Best Answer

Hi @johnglick

The rejection message on the rejected feature from XMLXQueryExtractor with the XQuery expression //a/@href is "...can not serialize attribute node".

Searching this error led to this StackOverflow answer. It seems the error is caused by serializing certain result types. Using either of the two functions mentioned in the answer, data() or string(), on the attribute does allow the XMLXQueryExtractor to successfully complete (eg. //a/data(@href) or //a/string(@href).

I'll note it is also possible to use the HTMLExtractor to extract URLs as an alternative to the XMLXQueryExtractor.

Use the XQuery expression //a/data(@href) in the XMLXQueryExtractor or CSS selector a[href] in HTMLExtractor

J

Hi @johnglick

The rejection message on the rejected feature from XMLXQueryExtractor with the XQuery expression //a/@href is "...can not serialize attribute node".

Searching this error led to this StackOverflow answer. It seems the error is caused by serializing certain result types. Using either of the two functions mentioned in the answer, data() or string(), on the attribute does allow the XMLXQueryExtractor to successfully complete (eg. //a/data(@href) or //a/string(@href).

I'll note it is also possible to use the HTMLExtractor to extract URLs as an alternative to the XMLXQueryExtractor.

Use the XQuery expression //a/data(@href) in the XMLXQueryExtractor or CSS selector a[href] in HTMLExtractor

Thanks! I never realized it was possible to put a parameter other than "Whole" or "Part" in the Tag Part/HTML Attribute area of the HTML extractor. Thanks to your input I can now eliminate a transformer from my workflow!

extracting link from html xquery issue.

5 replies

Reply

Community Stats

Reply

Community Stats

Sign up

An FME Account is required to contribute

Login to the community

An FME Account is required to contribute

Scanning file for viruses.

This file cannot be downloaded