Skip to main content

Hey everyone,

I tried to find any suggestions how to deal with big Datasets. In Numbers: I have a Shape linedataset with 10 000 000 lines, the .shp has approx. 5gb the dbf file 7gb. Starting the reader took 3h+ then i stopped it manually. In Qgis it also takes some time, but lets say several minutes. Is there a more efficient format I should transform the Dataset before using FME?

Hi @magdalenakoubek,

 

Depending on what you want to do with your data in FME you may not need to read them all in. With FME you can leverage the spatial index and provide a search envelope to limit the amount of data which needs to be read in. Here is the doc on that.

If you use the FeatureReader transformer instead of a standard reader you can provide a polygon in the form of a search envelope if you have an area of interest.

I suggest using the latest version of the FME 2018.0 beta if you want the fastest performance. Our shapefile reader is pretty snappy. Keep in mind that when FME Reads a shapefile it will read the attributes and geometry at the same time and depending on what you do will keep these in memory. This is likely different to how QGIS would handle reading the data.

 

When reading big data like this be sure you have a lot of free memory (I would recommend 16 GB for this at least). Please also ensure you are running FME 64 bit (not 32 bit) to ensure that FME can use more that the 4GB memory limit for 32 bit applications.

 

 

When working with large data like this it might be worth considering a database format and to let the database do the work. Take a look at PostGIS/PostgreSQL as an option.

You may run into issue with reading this shapefile as I understand it the file limit for the .shp and the .dbf is 2GB so I'm surprised and interested to hear of your very large shapefile - here is a wiki link to further info on that


I'd further add that if possible we'd love to get a copy of that dataset in here to test with. Please contact support@safe.com and we'll do our best to make that easy.


Thanks a lot for your ideas! Interesting that shapes can not exceed 2gb, I will investigate why my files are than so huge! Anyway I will try out the bounding box approach. If it is not working I guess I have to work with a database anyway, although I wanted to avoid that. Small memory is maybe also a reason. I am running currently with 8gb, and my computer is really working at its limits, so lets see if I can upgrade it a bit!


Thanks a lot for your ideas! Interesting that shapes can not exceed 2gb, I will investigate why my files are than so huge! Anyway I will try out the bounding box approach. If it is not working I guess I have to work with a database anyway, although I wanted to avoid that. Small memory is maybe also a reason. I am running currently with 8gb, and my computer is really working at its limits, so lets see if I can upgrade it a bit!

Upgrading your system would be pretty helpful here but it will still take some time to read it in to FME workbench in full. Leveraging the spatial index of the shaefile is the key to performance and that is where the bounding box will come in.

 

 

In saying that though we would be interested to take a look at your file to run a few tests. Please follow this link to file a technical support case: https://www.safe.com/support/report-a-problem/ We have an ftp site where you can upload your shapefile for us to take a look at. The location of the ftp site is on the submission web form. It would really help us with making FME that much better!

 

 


Reply