Skip to main content
Question

How to deal with big Data?

  • November 20, 2017
  • 4 replies
  • 90 views

Hey everyone,

I tried to find any suggestions how to deal with big Datasets. In Numbers: I have a Shape linedataset with 10 000 000 lines, the .shp has approx. 5gb the dbf file 7gb. Starting the reader took 3h+ then i stopped it manually. In Qgis it also takes some time, but lets say several minutes. Is there a more efficient format I should transform the Dataset before using FME?

4 replies

fmelizard
Contributor
Forum|alt.badge.img+17
  • Contributor
  • November 20, 2017

Hi @magdalenakoubek,

 

Depending on what you want to do with your data in FME you may not need to read them all in. With FME you can leverage the spatial index and provide a search envelope to limit the amount of data which needs to be read in. Here is the doc on that.

If you use the FeatureReader transformer instead of a standard reader you can provide a polygon in the form of a search envelope if you have an area of interest.

I suggest using the latest version of the FME 2018.0 beta if you want the fastest performance. Our shapefile reader is pretty snappy. Keep in mind that when FME Reads a shapefile it will read the attributes and geometry at the same time and depending on what you do will keep these in memory. This is likely different to how QGIS would handle reading the data.

 

When reading big data like this be sure you have a lot of free memory (I would recommend 16 GB for this at least). Please also ensure you are running FME 64 bit (not 32 bit) to ensure that FME can use more that the 4GB memory limit for 32 bit applications.

 

 

When working with large data like this it might be worth considering a database format and to let the database do the work. Take a look at PostGIS/PostgreSQL as an option.

You may run into issue with reading this shapefile as I understand it the file limit for the .shp and the .dbf is 2GB so I'm surprised and interested to hear of your very large shapefile - here is a wiki link to further info on that


fmelizard
Contributor
Forum|alt.badge.img+17
  • Contributor
  • November 21, 2017

I'd further add that if possible we'd love to get a copy of that dataset in here to test with. Please contact support@safe.com and we'll do our best to make that easy.


Thanks a lot for your ideas! Interesting that shapes can not exceed 2gb, I will investigate why my files are than so huge! Anyway I will try out the bounding box approach. If it is not working I guess I have to work with a database anyway, although I wanted to avoid that. Small memory is maybe also a reason. I am running currently with 8gb, and my computer is really working at its limits, so lets see if I can upgrade it a bit!


fmelizard
Contributor
Forum|alt.badge.img+17
  • Contributor
  • November 21, 2017
magdalenakoubek wrote:

Thanks a lot for your ideas! Interesting that shapes can not exceed 2gb, I will investigate why my files are than so huge! Anyway I will try out the bounding box approach. If it is not working I guess I have to work with a database anyway, although I wanted to avoid that. Small memory is maybe also a reason. I am running currently with 8gb, and my computer is really working at its limits, so lets see if I can upgrade it a bit!

Upgrading your system would be pretty helpful here but it will still take some time to read it in to FME workbench in full. Leveraging the spatial index of the shaefile is the key to performance and that is where the bounding box will come in.

 

 

In saying that though we would be interested to take a look at your file to run a few tests. Please follow this link to file a technical support case: https://www.safe.com/support/report-a-problem/ We have an ftp site where you can upload your shapefile for us to take a look at. The location of the ftp site is on the submission web form. It would really help us with making FME that much better!

 

 


Reply


Cookie policy

We use cookies to enhance and personalize your experience. If you accept you agree to our full cookie policy. Learn more about our cookies.

 
Cookie settings