Skip to main content
Solved

Extract href a from website

  • September 14, 2018
  • 3 replies
  • 79 views

john_esk
Contributor
Forum|alt.badge.img+7

I'm trying to extract a download-URL within a href. I have tried using the HTMLExtractor, ListExploder and XMLYQueryExtractor to isolate the specific URL to an attribute but with no luck.

I need too isolate the URL with the .zip (http://gpt.vic-metria.nu/data/land/NM.zip) from this http://mdp01.vic-metria.nu/geonetwork/srv/en/csw?request=GetRecordById!!!service=CSW!!!version=2.0.2!!!elementSetName=full!!!id=c6b02e88-8084-4b3f-8a7d-33e5d45349c4!!!outputSchema=csw:IsoRecord" target="_blank">webpage . Is it possible and how?

Best answer by takashi

Hi @john_eskilstuna, I think the HTMLExtractor works effectively here.

This post is closed to further activity.
It may be an old question, an answered question, an implemented idea, or a notification-only post.
Please check post dates before relying on any information in a question or answer.
For follow-up or related questions, please post a new question or idea.
If there is a genuine update to be made, please contact us and request that the post is reopened.

3 replies

takashi
Celebrity
  • 7843 replies
  • Best Answer
  • September 14, 2018

Hi @john_eskilstuna, I think the HTMLExtractor works effectively here.


john_esk
Contributor
Forum|alt.badge.img+7
  • Author
  • Contributor
  • 8 replies
  • September 17, 2018

Hi @john_eskilstuna, I think the HTMLExtractor works effectively here.

My translation fails due to "Invalid xquery". What could be the problem?


takashi
Celebrity
  • 7843 replies
  • September 17, 2018

Hi @john_eskilstuna, I think the HTMLExtractor works effectively here.

Your CSS Selector could be wrong (closing double quotation is missing). Modify the selector, then see if the _url{} list contains expected elements with the Logger.