Solved

HTMLExtractor help appreciated

  • 2 November 2021
  • 3 replies
  • 6 views

Badge +16

Hi, I'm inefficient with CSS selector statements - having no HTML authoring background. I want the CSV download links at this site and only the csv download links:

https://data.sandiego.gov/datasets/parking-citations/

I can get all 'a' tags and the href references no problem but have to test the result ends with '.csv'. Can anyone give me the right selector syntax for the element I'm after? Thanks all.

icon

Best answer by ebygomm 2 November 2021, 15:32

View original

3 replies

Userlevel 5
Badge +25

Rather than tinker with CSS selectors I decided to go the easy way:

Extract all links from the entire web page, save it as a list, explode the list and select the ones ending in .csv

Screenshot 2021-11-02 at 15.25.50In addition to that, the filenames are actually predictable, you don't really have to go through the HTMLExtractor at all...

Badge +16

Thanks Hans, you'll see in the attached workspace I took a similar approach, but it frustrates me I couldn't quickly figure out CSS selectors.

Ideally HTMLExtractor would be data-aware (Annabelle?) and you could get a picker!

Userlevel 1
Badge +21

Try this setup in the HTML Extractor

image

Reply