Skip to main content

I'm a beginner with FME and have only used it for a couple of weeks doing simple tasks in Workbench so my knowledge is not very good. I'm now trying to set up an automatic process for downloading data(zip files) from a Swedish authoritys website:

http://extra.lansstyrelsen.se/gis/Sv/lansvisa-geodata/sodermanlands-lan/Pages/default.aspx

I have created a FME script for downloading the specific zip (LstD Ädellövinventering) using a HTTPCaller. I only want the script to run when the specific zip file has been updated. Is that possible? And how does that work? I also want to be notified by email if the data failed to download. And maybe let the script run til the download is sucessful if that's a good idea?

Hi @john_eskilstuna

I believe that you can use transformers to extract html elements before your HTTCaller.

The situation is: If there is a new file in the Website, your Workspace will compare the update date from the files on the past.

The transformer HTMLExtractor could be used to extract information from the Website.

This is a suggestion.

Thanks,

Danilo


Hi @john_eskilstuna

 

 

Is there any indicator to say that the data has been modified/updated?

 

If so, then you could build that logic into a workspace that looks at date or timestamps to work out if data has been changed.

 

Then, using FME Server, you could set this workspace to run every day. When the workspaces runs, it'll check does update date = yesterday, if yes download. If no, do nothing.

 

Then you can also set up FME Server to email you if the workspace fails (or succeeds, but you maybe don't want a lot of emails if nothing's changed). There are also email transformers you can build into the workspace to send an email at any point in the workspace to let you know what's going on. (FMEServerEmailGenerator + FMEServerNotifier, or Emailer)

Thanks for the replies @jlutherthomas @danilo_fme

This is what I have come up with so far. Since the only way to know if the data has been updated is by the date displayed on the website. I found a .fmw used to extract information from a website which I have used. The problem is that there are several dates displayed and at the moment I'm using the last tester(Tester_2) to extract the specific row which only works until they add/change the HTML. My second picture show the specific date I want to extract. Is it possible to extract the specific element which contains " Inventering av ädellövskogar i Södermanlands län"? <tr class......</tr>. Then I can extract the date easily without being dependent of a specific row.


Hi @john_eskilstuna, you can use the HTMLExtractor to populate every <tr> (row of the table) element into a list attribute.

0684Q00000ArKb3QAF.png

Next, use the ListExploder to create features for each row. Each row is a valid XML like this.

<tr class="telerik-reTableOddRow-4">
    <td class="telerik-reTableEvenCol-4">
        <p>
            <a href="http://ext-dokument.lansstyrelsen.se/sodermanland/geodata/Alleinv_dlan.zip">
                <img alt="" class="ms-asset-icon" src="/_layouts/IMAGES/iczip.gif" style="border:0px solid;"/>LstD Alléinventering</a>
        </p>
    </td>
    <td class="telerik-reTableOddCol-4">
        <p> Alléinventering i Södermanlands län</p>
    </td>
    <td class="telerik-reTableEvenCol-4">
        <p> 2014-04-30</p>
    </td>
</tr>

You can then use the XMLXQueryExtractor to extract the URL of a zip file, data name, description and date as feature attributes.

XQuery Expression (Example)

fme:set-attribute('Zip URL', data(//td 1]/p/a/@href)),
fme:set-attribute('Data Name', normalize-space(data(//td<1]/p/a))),
fme:set-attribute('Description', normalize-space(data(//td,2]/p))),
fme:set-attribute('Update', normalize-space(data(//tdo3]/p)))

0684Q00000ArKoUQAV.png

Note: There were some <tr> elements that are missing internal <p> tags. The XQuery expression above cannot extract valid values from such irregular <tr> elements. If you need to extract values from the irregular <tr> elements, a little more complex expression will be required.


Hi @john_eskilstuna, you can use the HTMLExtractor to populate every <tr> (row of the table) element into a list attribute.

0684Q00000ArKb3QAF.png

Next, use the ListExploder to create features for each row. Each row is a valid XML like this.

<tr class="telerik-reTableOddRow-4">
    <td class="telerik-reTableEvenCol-4">
        <p>
            <a href="http://ext-dokument.lansstyrelsen.se/sodermanland/geodata/Alleinv_dlan.zip">
                <img alt="" class="ms-asset-icon" src="/_layouts/IMAGES/iczip.gif" style="border:0px solid;"/>LstD Alléinventering</a>
        </p>
    </td>
    <td class="telerik-reTableOddCol-4">
        <p> Alléinventering i Södermanlands län</p>
    </td>
    <td class="telerik-reTableEvenCol-4">
        <p> 2014-04-30</p>
    </td>
</tr>

You can then use the XMLXQueryExtractor to extract the URL of a zip file, data name, description and date as feature attributes.

XQuery Expression (Example)

fme:set-attribute('Zip URL', data(//td 1]/p/a/@href)),
fme:set-attribute('Data Name', normalize-space(data(//td<1]/p/a))),
fme:set-attribute('Description', normalize-space(data(//td,2]/p))),
fme:set-attribute('Update', normalize-space(data(//tdo3]/p)))

0684Q00000ArKoUQAV.png

Note: There were some <tr> elements that are missing internal <p> tags. The XQuery expression above cannot extract valid values from such irregular <tr> elements. If you need to extract values from the irregular <tr> elements, a little more complex expression will be required.

This expression is simpler and also can be applied to the irregular <tr> elements.

 

fme:set-attribute('Zip URL', data(//tdÂ1]/p/a/@href)),
fme:set-attribute('Data Name', normalize-space(data(//tds1]))),
fme:set-attribute('Description', normalize-space(data(//tdt2]))),
fme:set-attribute('Update', normalize-space(data(//td'3]))) 

Hi @john_eskilstuna, you can use the HTMLExtractor to populate every <tr> (row of the table) element into a list attribute.

0684Q00000ArKb3QAF.png

Next, use the ListExploder to create features for each row. Each row is a valid XML like this.

<tr class="telerik-reTableOddRow-4">
    <td class="telerik-reTableEvenCol-4">
        <p>
            <a href="http://ext-dokument.lansstyrelsen.se/sodermanland/geodata/Alleinv_dlan.zip">
                <img alt="" class="ms-asset-icon" src="/_layouts/IMAGES/iczip.gif" style="border:0px solid;"/>LstD Alléinventering</a>
        </p>
    </td>
    <td class="telerik-reTableOddCol-4">
        <p> Alléinventering i Södermanlands län</p>
    </td>
    <td class="telerik-reTableEvenCol-4">
        <p> 2014-04-30</p>
    </td>
</tr>

You can then use the XMLXQueryExtractor to extract the URL of a zip file, data name, description and date as feature attributes.

XQuery Expression (Example)

fme:set-attribute('Zip URL', data(//td 1]/p/a/@href)),
fme:set-attribute('Data Name', normalize-space(data(//td<1]/p/a))),
fme:set-attribute('Description', normalize-space(data(//td,2]/p))),
fme:set-attribute('Update', normalize-space(data(//tdo3]/p)))

0684Q00000ArKoUQAV.png

Note: There were some <tr> elements that are missing internal <p> tags. The XQuery expression above cannot extract valid values from such irregular <tr> elements. If you need to extract values from the irregular <tr> elements, a little more complex expression will be required.

Thanks! That helped a lot. When a download occurs my idea is to write the "Update" date to a txt file and use that to compare. When the date in the text file and the date on the website doesn't match the download will take place. What is the best method for the comparising? I have tried attaching them to a Tester(text_line_data = Update) and attaching an inspector to "Failed". But even when both dates match it still lets them through. 

 

0684Q00000ArMZ2QAN.png


Thanks! That helped a lot. When a download occurs my idea is to write the "Update" date to a txt file and use that to compare. When the date in the text file and the date on the website doesn't match the download will take place. What is the best method for the comparising? I have tried attaching them to a Tester(text_line_data = Update) and attaching an inspector to "Failed". But even when both dates match it still lets them through.

 

I managed to solve this by using the FeatureMerger.

 

 


Hi @john_eskilstuna, I wrote a quick article about this almost 2 years that might interest you....same goals 🙂 https://www.linkedin.com/pulse/automating-software-downloads-david-baldacchino/


Hi @john_eskilstuna, I wrote a quick article about this almost 2 years that might interest you....same goals 🙂 https://www.linkedin.com/pulse/automating-software-downloads-david-baldacchino/

Since then I have replicated this workspace to work for more software packages by making REST API calls (if that was an option), or web scraping/sleuthing through Chrome dev. tools etc. to infer latest version etc. Once you get that data and compare to your text files, you decide whether to download and if so, write a new text file with updated version information to compare against next time. I also launch a VBS script to distribute the downloaded file to all offices using a SystemCaller and email a group of people about it. I took it a little further and used Microsoft Flow (now Power Automate) to look at my inbox and re-post those email as posts to certain MS Teams channels. The difficulty I have come across are for software that can only be downloaded via an account (not a public link) where you have to supply credentials to gain access; I have not been able to find a solution for those cases.


Reply