Skip to main content

Hi. I'm looking for a way for FME to check a folder full of GMLs and either delete the ones that don't fall within a shapefile area feature, or move those that do to a different folder, but I assume the first option is easier ... but I could be wrong.

 

So for instance if I had a GML location for each Corona virus case in a folder, I want it to delete those GMLs that fall outside of my shapefile of the UK.

 

Any ideas? Do it for the children - and the elderly!

 

J

Do you know the spatial extents of each gml file?

 

If they are small files, you could just read every gml file in, expose the fme_dataset attribute, do the spatial comparison with the boundary then use a filecopywriter to either delete those that fall outside the boundary or more those that fall inside the boundary


Do you know the spatial extents of each gml file?

 

If they are small files, you could just read every gml file in, expose the fme_dataset attribute, do the spatial comparison with the boundary then use a filecopywriter to either delete those that fall outside the boundary or more those that fall inside the boundary

I don't know them without loading them. They are small files, yes. Single point data with a small table about date, recent history, diagnosis, outcome etc. About 5kb each.


Use a Director and Path reader with *.gml to generate a feature per file, a FeatureReader to read each file, a Shape file reader to read your region features, and a SpatialRelator to filter for those gml files are contained in your regions. I'm not sure how many features are contained in each file. If there is only one feature per file, than this workflow should work. If there are multiple features per file then you may need a BoundingBoxAccumulator with the group-by set to fme_dataset name.


I covered this question in my question-of-the-week.

There are lots of methods (@ebygomm and @deanatsafe gave a couple here). But I think the fastest method might be to use the Envelope tag in the GML data to extract the data extents, rather than having to read the entire dataset.

In my demo workspace, it takes 2.8 seconds to process 80 files, which I think it pretty quick.

So check out the article and I hope you find it useful.


Reply