Skip to main content

I have a custom transformer that was written with a lot of help from 1Spatial (thank you 🙂!!), The idea is that it goes through an Atom (georss) feed, and identifies links within feed that are zip files.

However, due to the nature of some of the feeds I am working with, sometimes the download is on a second page (see, for example: http://www.catastro.minhap.es/INSPIRE/CadastralParcels/ES.SDGC.CP.atom.xml), and therefore I have a loop which goes through all the non-zip links that are in the initial feed, and checks if the subsequent feed contains zip files.

When I run this using the URL https://www.ign.es/atom/dataset_feeds/lin_lim_mun.es.xml

I find that it goes back into the initial Input loop and therefore identifies the same zip file twice.

Can anyone shed some light on why this is occurring, and suggest how to stop this issue? I'm really struggling, and need to avoid this duplicated data!!

 

I have attached a workbench which runs the custom transformer, along with the csv file which has the URL I mention above

 

Many thanks in advance for you help

Fiona

Hello @fionahf​ 

 

This looks to be happening because the attached URL returns a feature with 2 links. The links are:

(1) https://centrodedescargas.cnig.es/CentroDescargas/documentos/atom/au/au_AdministrativeUnit_1stOrder0.gml
(2) https://centrodedescargas.cnig.es/CentroDescargas/documentos/atom/au/lineas_limite_gml.zip

Based on your workflow in the Custom Transformer, these are broken into 2 features(listexploder_5) and the zip file(feature 2 above) is continued whereas the  the non-zip(feature1 above) is passed back into the loop with the same URL as URL2.

URL2 = https://centrodedescargas.cnig.es/CentroDescargas/documentos/atom/au/au_AdministrativeUnit_1stOrder0.gml

So it looks like either a data issue from the server, or you would have to try and remove duplicated URLs maybe based off the file names. 

 

Let me know if that helps!

 


Thanks for your response - from my understanding the first 'Input' seems to get called twice though - it certainly produces two messages, both reporting the same initial URL https://www.ign.es/atom/dataset_feeds/lin_lim_mun.es.xml

 

And that's where I get confused - I am expecting the URL2 to go into the 'Loop Entry' - that's fine (although not ideal in this particular case, in others it's what I need).


Reply