Hi @john_eskilstuna
I believe that you can use transformers to extract html elements before your HTTCaller.
The situation is: If there is a new file in the Website, your Workspace will compare the update date from the files on the past.
The transformer HTMLExtractor could be used to extract information from the Website.
This is a suggestion.
Thanks,
Danilo
Hi @john_eskilstuna
Â
Â
Is there any indicator to say that the data has been modified/updated?
Â
If so, then you could build that logic into a workspace that looks at date or timestamps to work out if data has been changed.
Â
Then, using FME Server, you could set this workspace to run every day. When the workspaces runs, it'll check does update date = yesterday, if yes download. If no, do nothing.
Â
Then you can also set up FME Server to email you if the workspace fails (or succeeds, but you maybe don't want a lot of emails if nothing's changed). There are also email transformers you can build into the workspace to send an email at any point in the workspace to let you know what's going on. (FMEServerEmailGenerator + FMEServerNotifier, or Emailer)
Thanks for the replies @jlutherthomas @danilo_fme
This is what I have come up with so far. Since the only way to know if the data has been updated is by the date displayed on the website. I found a .fmw used to extract information from a website which I have used. The problem is that there are several dates displayed and at the moment I'm using the last tester(Tester_2) to extract the specific row which only works until they add/change the HTML. My second picture show the specific date I want to extract. Is it possible to extract the specific element which contains " Inventering av ädellövskogar i Södermanlands län"? <tr class......</tr>. Then I can extract the date easily without being dependent of a specific row.
Hi @john_eskilstuna, you can use the HTMLExtractor to populate every <tr> (row of the table) element into a list attribute.
Next, use the ListExploder to create features for each row. Each row is a valid XML like this.
<tr class="telerik-reTableOddRow-4">
    <td class="telerik-reTableEvenCol-4">
        <p>
            <a href="http://ext-dokument.lansstyrelsen.se/sodermanland/geodata/Alleinv_dlan.zip">
                <img alt="" class="ms-asset-icon" src="/_layouts/IMAGES/iczip.gif" style="border:0px solid;"/>LstD Alléinventering</a>
        </p>
    </td>
    <td class="telerik-reTableOddCol-4">
        <p> Alléinventering i Södermanlands län</p>
    </td>
    <td class="telerik-reTableEvenCol-4">
        <p> 2014-04-30</p>
    </td>
</tr>
You can then use the XMLXQueryExtractor to extract the URL of a zip file, data name, description and date as feature attributes.
XQuery Expression (Example)
fme:set-attribute('Zip URL', data(//td 1]/p/a/@href)),
fme:set-attribute('Data Name', normalize-space(data(//td<1]/p/a))),
fme:set-attribute('Description', normalize-space(data(//td,2]/p))),
fme:set-attribute('Update', normalize-space(data(//tdo3]/p)))
Note: There were some <tr> elements that are missing internal <p> tags. The XQuery expression above cannot extract valid values from such irregular <tr> elements. If you need to extract values from the irregular <tr> elements, a little more complex expression will be required.
Hi @john_eskilstuna, you can use the HTMLExtractor to populate every <tr> (row of the table) element into a list attribute.
Next, use the ListExploder to create features for each row. Each row is a valid XML like this.
<tr class="telerik-reTableOddRow-4">
    <td class="telerik-reTableEvenCol-4">
        <p>
            <a href="http://ext-dokument.lansstyrelsen.se/sodermanland/geodata/Alleinv_dlan.zip">
                <img alt="" class="ms-asset-icon" src="/_layouts/IMAGES/iczip.gif" style="border:0px solid;"/>LstD Alléinventering</a>
        </p>
    </td>
    <td class="telerik-reTableOddCol-4">
        <p> Alléinventering i Södermanlands län</p>
    </td>
    <td class="telerik-reTableEvenCol-4">
        <p> 2014-04-30</p>
    </td>
</tr>
You can then use the XMLXQueryExtractor to extract the URL of a zip file, data name, description and date as feature attributes.
XQuery Expression (Example)
fme:set-attribute('Zip URL', data(//td 1]/p/a/@href)),
fme:set-attribute('Data Name', normalize-space(data(//td<1]/p/a))),
fme:set-attribute('Description', normalize-space(data(//td,2]/p))),
fme:set-attribute('Update', normalize-space(data(//tdo3]/p)))
Note: There were some <tr> elements that are missing internal <p> tags. The XQuery expression above cannot extract valid values from such irregular <tr> elements. If you need to extract values from the irregular <tr> elements, a little more complex expression will be required.
This expression is simpler and also can be applied to the irregular <tr> elements.
Â
fme:set-attribute('Zip URL', data(//tdÂ1]/p/a/@href)),
fme:set-attribute('Data Name', normalize-space(data(//tds1]))),
fme:set-attribute('Description', normalize-space(data(//tdt2]))),
fme:set-attribute('Update', normalize-space(data(//td'3])))Â
Hi @john_eskilstuna, you can use the HTMLExtractor to populate every <tr> (row of the table) element into a list attribute.
Next, use the ListExploder to create features for each row. Each row is a valid XML like this.
<tr class="telerik-reTableOddRow-4">
    <td class="telerik-reTableEvenCol-4">
        <p>
            <a href="http://ext-dokument.lansstyrelsen.se/sodermanland/geodata/Alleinv_dlan.zip">
                <img alt="" class="ms-asset-icon" src="/_layouts/IMAGES/iczip.gif" style="border:0px solid;"/>LstD Alléinventering</a>
        </p>
    </td>
    <td class="telerik-reTableOddCol-4">
        <p> Alléinventering i Södermanlands län</p>
    </td>
    <td class="telerik-reTableEvenCol-4">
        <p> 2014-04-30</p>
    </td>
</tr>
You can then use the XMLXQueryExtractor to extract the URL of a zip file, data name, description and date as feature attributes.
XQuery Expression (Example)
fme:set-attribute('Zip URL', data(//td 1]/p/a/@href)),
fme:set-attribute('Data Name', normalize-space(data(//td<1]/p/a))),
fme:set-attribute('Description', normalize-space(data(//td,2]/p))),
fme:set-attribute('Update', normalize-space(data(//tdo3]/p)))
Note: There were some <tr> elements that are missing internal <p> tags. The XQuery expression above cannot extract valid values from such irregular <tr> elements. If you need to extract values from the irregular <tr> elements, a little more complex expression will be required.
Thanks! That helped a lot. When a download occurs my idea is to write the "Update" date to a txt file and use that to compare. When the date in the text file and the date on the website doesn't match the download will take place. What is the best method for the comparising? I have tried attaching them to a Tester(text_line_data = Update) and attaching an inspector to "Failed". But even when both dates match it still lets them through.Â
Â
Thanks! That helped a lot. When a download occurs my idea is to write the "Update" date to a txt file and use that to compare. When the date in the text file and the date on the website doesn't match the download will take place. What is the best method for the comparising? I have tried attaching them to a Tester(text_line_data = Update) and attaching an inspector to "Failed". But even when both dates match it still lets them through.
Â
I managed to solve this by using the FeatureMerger.
Â
Â
Hi @john_eskilstuna, I wrote a quick article about this almost 2 years that might interest you....same goals https://www.linkedin.com/pulse/automating-software-downloads-david-baldacchino/
Hi @john_eskilstuna, I wrote a quick article about this almost 2 years that might interest you....same goals https://www.linkedin.com/pulse/automating-software-downloads-david-baldacchino/
Since then I have replicated this workspace to work for more software packages by making REST API calls (if that was an option), or web scraping/sleuthing through Chrome dev. tools etc. to infer latest version etc. Once you get that data and compare to your text files, you decide whether to download and if so, write a new text file with updated version information to compare against next time. I also launch a VBS script to distribute the downloaded file to all offices using a SystemCaller and email a group of people about it. I took it a little further and used Microsoft Flow (now Power Automate) to look at my inbox and re-post those email as posts to certain MS Teams channels. The difficulty I have come across are for software that can only be downloaded via an account (not a public link) where you have to supply credentials to gain access; I have not been able to find a solution for those cases.