Skip to main content
Solved

Extract href a from website

  • September 14, 2018
  • 3 replies
  • 42 views

john_esk
Contributor
Forum|alt.badge.img+6

I'm trying to extract a download-URL within a href. I have tried using the HTMLExtractor, ListExploder and XMLYQueryExtractor to isolate the specific URL to an attribute but with no luck.

I need too isolate the URL with the .zip (http://gpt.vic-metria.nu/data/land/NM.zip) from this http://mdp01.vic-metria.nu/geonetwork/srv/en/csw?request=GetRecordById!!!service=CSW!!!version=2.0.2!!!elementSetName=full!!!id=c6b02e88-8084-4b3f-8a7d-33e5d45349c4!!!outputSchema=csw:IsoRecord" target="_blank">webpage . Is it possible and how?

Best answer by takashi

Hi @john_eskilstuna, I think the HTMLExtractor works effectively here.

View original
Did this help you find an answer to your question?

3 replies

takashi
Influencer
  • Best Answer
  • September 14, 2018

Hi @john_eskilstuna, I think the HTMLExtractor works effectively here.


john_esk
Contributor
Forum|alt.badge.img+6
  • Author
  • Contributor
  • September 17, 2018
takashi wrote:

Hi @john_eskilstuna, I think the HTMLExtractor works effectively here.

My translation fails due to "Invalid xquery". What could be the problem?


takashi
Influencer
  • September 17, 2018
takashi wrote:

Hi @john_eskilstuna, I think the HTMLExtractor works effectively here.

Your CSS Selector could be wrong (closing double quotation is missing). Modify the selector, then see if the _url{} list contains expected elements with the Logger.

 

 


Reply


Cookie policy

We use cookies to enhance and personalize your experience. If you accept you agree to our full cookie policy. Learn more about our cookies.

 
Cookie settings