Solved

Possible to trigger download when data(zip file) on website is updated?

7 years ago
June 20, 2018
9 replies
121 views

john_esk
Contributor
8 replies

I'm a beginner with FME and have only used it for a couple of weeks doing simple tasks in Workbench so my knowledge is not very good. I'm now trying to set up an automatic process for downloading data(zip files) from a Swedish authoritys website:

http://extra.lansstyrelsen.se/gis/Sv/lansvisa-geodata/sodermanlands-lan/Pages/default.aspx

I have created a FME script for downloading the specific zip (LstD Ädellövinventering) using a HTTPCaller. I only want the script to run when the specific zip file has been updated. Is that possible? And how does that work? I also want to be notified by email if the data failed to download. And maybe let the script run til the download is sucessful if that's a good idea?

Best answer by takashi

Hi @john_eskilstuna, you can use the HTMLExtractor to populate every <tr> (row of the table) element into a list attribute.

Next, use the ListExploder to create features for each row. Each row is a valid XML like this.

<tr class="telerik-reTableOddRow-4">
    <td class="telerik-reTableEvenCol-4">
        <p>
            <a href="http://ext-dokument.lansstyrelsen.se/sodermanland/geodata/Alleinv_dlan.zip">
                <img alt="" class="ms-asset-icon" src="/_layouts/IMAGES/iczip.gif" style="border:0px solid;"/>LstD Alléinventering</a>
        </p>
    </td>
    <td class="telerik-reTableOddCol-4">
        <p> Alléinventering i Södermanlands län</p>
    </td>
    <td class="telerik-reTableEvenCol-4">
        <p> 2014-04-30</p>
    </td>
</tr>

You can then use the XMLXQueryExtractor to extract the URL of a zip file, data name, description and date as feature attributes.

XQuery Expression (Example)

fme:set-attribute('Zip URL', data(//td[1]/p/a/@href)),
fme:set-attribute('Data Name', normalize-space(data(//td[1]/p/a))),
fme:set-attribute('Description', normalize-space(data(//td[2]/p))),
fme:set-attribute('Update', normalize-space(data(//td[3]/p)))

Note: There were some <tr> elements that are missing internal <p> tags. The XQuery expression above cannot extract valid values from such irregular <tr> elements. If you need to extract values from the irregular <tr> elements, a little more complex expression will be required.

View original

+45

danilo_fme
Evangelist
2057 replies
7 years ago
June 22, 2018

Hi @john_eskilstuna

I believe that you can use transformers to extract html elements before your HTTCaller.

The situation is: If there is a new file in the Website, your Workspace will compare the update date from the files on the past.

The transformer HTMLExtractor could be used to extract information from the Website.

This is a suggestion.

Thanks,

Danilo

jlutherthomas
364 replies
7 years ago
June 25, 2018

Hi @john_eskilstuna

Is there any indicator to say that the data has been modified/updated?

If so, then you could build that logic into a workspace that looks at date or timestamps to work out if data has been changed.

Then, using FME Server, you could set this workspace to run every day. When the workspaces runs, it'll check does update date = yesterday, if yes download. If no, do nothing.

Then you can also set up FME Server to email you if the workspace fails (or succeeds, but you maybe don't want a lot of emails if nothing's changed). There are also email transformers you can build into the workspace to send an email at any point in the workspace to let you know what's going on. (FMEServerEmailGenerator + FMEServerNotifier, or Emailer)

john_esk
Author
Contributor
8 replies
7 years ago
June 27, 2018

Thanks for the replies @jlutherthomas @danilo_fme

This is what I have come up with so far. Since the only way to know if the data has been updated is by the date displayed on the website. I found a .fmw used to extract information from a website which I have used. The problem is that there are several dates displayed and at the moment I'm using the last tester(Tester_2) to extract the specific row which only works until they add/change the HTML. My second picture show the specific date I want to extract. Is it possible to extract the specific element which contains " Inventering av ädellövskogar i Södermanlands län"? <tr class......</tr>. Then I can extract the date easily without being dependent of a specific row.

takashi
7715 replies
Best Answer
7 years ago
June 27, 2018

Hi @john_eskilstuna, you can use the HTMLExtractor to populate every <tr> (row of the table) element into a list attribute.

Next, use the ListExploder to create features for each row. Each row is a valid XML like this.

<tr class="telerik-reTableOddRow-4">
    <td class="telerik-reTableEvenCol-4">
        <p>
            <a href="http://ext-dokument.lansstyrelsen.se/sodermanland/geodata/Alleinv_dlan.zip">
                <img alt="" class="ms-asset-icon" src="/_layouts/IMAGES/iczip.gif" style="border:0px solid;"/>LstD Alléinventering</a>
        </p>
    </td>
    <td class="telerik-reTableOddCol-4">
        <p> Alléinventering i Södermanlands län</p>
    </td>
    <td class="telerik-reTableEvenCol-4">
        <p> 2014-04-30</p>
    </td>
</tr>

You can then use the XMLXQueryExtractor to extract the URL of a zip file, data name, description and date as feature attributes.

XQuery Expression (Example)

fme:set-attribute('Zip URL', data(//td[1]/p/a/@href)),
fme:set-attribute('Data Name', normalize-space(data(//td[1]/p/a))),
fme:set-attribute('Description', normalize-space(data(//td[2]/p))),
fme:set-attribute('Update', normalize-space(data(//td[3]/p)))

takashi
7715 replies
7 years ago
June 27, 2018

takashi wrote:

Hi @john_eskilstuna, you can use the HTMLExtractor to populate every <tr> (row of the table) element into a list attribute.

Next, use the ListExploder to create features for each row. Each row is a valid XML like this.

<tr class="telerik-reTableOddRow-4">
    <td class="telerik-reTableEvenCol-4">
        <p>
            <a href="http://ext-dokument.lansstyrelsen.se/sodermanland/geodata/Alleinv_dlan.zip">
                <img alt="" class="ms-asset-icon" src="/_layouts/IMAGES/iczip.gif" style="border:0px solid;"/>LstD Alléinventering</a>
        </p>
    </td>
    <td class="telerik-reTableOddCol-4">
        <p> Alléinventering i Södermanlands län</p>
    </td>
    <td class="telerik-reTableEvenCol-4">
        <p> 2014-04-30</p>
    </td>
</tr>

You can then use the XMLXQueryExtractor to extract the URL of a zip file, data name, description and date as feature attributes.

XQuery Expression (Example)

fme:set-attribute('Zip URL', data(//td[1]/p/a/@href)),
fme:set-attribute('Data Name', normalize-space(data(//td[1]/p/a))),
fme:set-attribute('Description', normalize-space(data(//td[2]/p))),
fme:set-attribute('Update', normalize-space(data(//td[3]/p)))

This expression is simpler and also can be applied to the irregular <tr> elements.

fme:set-attribute('Zip URL', data(//td[1]/p/a/@href)),
fme:set-attribute('Data Name', normalize-space(data(//td[1]))),
fme:set-attribute('Description', normalize-space(data(//td[2]))),
fme:set-attribute('Update', normalize-space(data(//td[3])))

john_esk
Author
Contributor
8 replies
7 years ago
June 27, 2018

takashi wrote:

Hi @john_eskilstuna, you can use the HTMLExtractor to populate every <tr> (row of the table) element into a list attribute.

Next, use the ListExploder to create features for each row. Each row is a valid XML like this.

<tr class="telerik-reTableOddRow-4">
    <td class="telerik-reTableEvenCol-4">
        <p>
            <a href="http://ext-dokument.lansstyrelsen.se/sodermanland/geodata/Alleinv_dlan.zip">
                <img alt="" class="ms-asset-icon" src="/_layouts/IMAGES/iczip.gif" style="border:0px solid;"/>LstD Alléinventering</a>
        </p>
    </td>
    <td class="telerik-reTableOddCol-4">
        <p> Alléinventering i Södermanlands län</p>
    </td>
    <td class="telerik-reTableEvenCol-4">
        <p> 2014-04-30</p>
    </td>
</tr>

You can then use the XMLXQueryExtractor to extract the URL of a zip file, data name, description and date as feature attributes.

XQuery Expression (Example)

fme:set-attribute('Zip URL', data(//td[1]/p/a/@href)),
fme:set-attribute('Data Name', normalize-space(data(//td[1]/p/a))),
fme:set-attribute('Description', normalize-space(data(//td[2]/p))),
fme:set-attribute('Update', normalize-space(data(//td[3]/p)))

Thanks! That helped a lot. When a download occurs my idea is to write the "Update" date to a txt file and use that to compare. When the date in the text file and the date on the website doesn't match the download will take place. What is the best method for the comparising? I have tried attaching them to a Tester(text_line_data = Update) and attaching an inspector to "Failed". But even when both dates match it still lets them through.

john_esk
Author
Contributor
8 replies
7 years ago
June 27, 2018

john_esk wrote:

I managed to solve this by using the FeatureMerger.

+13

dbaldacchino1
Enthusiast
136 replies
5 years ago
April 7, 2020

Hi @john_eskilstuna, I wrote a quick article about this almost 2 years that might interest you....same goals :) https://www.linkedin.com/pulse/automating-software-downloads-david-baldacchino/

+13

dbaldacchino1
Enthusiast
136 replies
5 years ago
April 7, 2020

dbaldacchino1 wrote:

Hi @john_eskilstuna, I wrote a quick article about this almost 2 years that might interest you....same goals :) https://www.linkedin.com/pulse/automating-software-downloads-david-baldacchino/

Since then I have replicated this workspace to work for more software packages by making REST API calls (if that was an option), or web scraping/sleuthing through Chrome dev. tools etc. to infer latest version etc. Once you get that data and compare to your text files, you decide whether to download and if so, write a new text file with updated version information to compare against next time. I also launch a VBS script to distribute the downloaded file to all offices using a SystemCaller and email a group of people about it. I took it a little further and used Microsoft Flow (now Power Automate) to look at my inbox and re-post those email as posts to certain MS Teams channels. The difficulty I have come across are for software that can only be downloaded via an account (not a public link) where you have to supply credentials to gain access; I have not been able to find a solution for those cases.

Reply

Rich Text Editor, editor1

Cookie policy

We use cookies to enhance and personalize your experience. If you accept you agree to our full cookie policy. Learn more about our cookies.

Cookie settings

We use 3 different kinds of cookies. You can choose which cookies you want to accept. We need basic cookies to make this site work, therefore these are the minimum you can select. Learn more about our cookies.

Basic
Functional

Normal
Functional + analytics

Complete
Functional + analytics + social media + embedded videos + marketing

Possible to trigger download when data(zip file) on website is updated?

9 replies

Reply

Helpful Members This Week

Recently Solved Questions

How to get a list of Asana tasks with their corresponding custom field values?

Using one AttributeRounder for different accuracies

Create date segments of two table with overlap of times

Automate Fanout of columns/splitting attributes to different output by attribute name

Tracing Multiple Networks from Sources to Valves Without Python

Community Stats

Latest FME

Cookie policy

Cookie settings

Reply

Related Topics

Dynamic reader schema mismatch for '@Value(ENCODED,fme_feature_type)'icon

Dynamic Read/Write: Reader Schema Feature Type different value from Feature Type Attribute Valueicon

|ERROR |Error executing SQL command ('FETCH 1000 IN "centraal_v_streng_export_flat_crsr"'): 'Error connecting to database' |A fatal error has occurred. Check the logfile above for detaicon

setting up user parameters for readersicon

Why are input parameters (--SourceDataset_...) ignored or not working?icon

Helpful Members This Week

Recently Solved Questions

How to get a list of Asana tasks with their corresponding custom field values?

Using one AttributeRounder for different accuracies

Create date segments of two table with overlap of times

Automate Fanout of columns/splitting attributes to different output by attribute name

Tracing Multiple Networks from Sources to Valves Without Python

Popular Tags

Community Stats

Latest FME

Sign up

An FME Account is required to contribute

Login to the community

An FME Account is required to contribute

Scanning file for viruses.

This file cannot be downloaded

Cookie policy

Cookie settings