Skip to main content

Any way to optimize the read of a 500mb GeoJSON-file? Already using 64bit Win with 10GB RAM. Trying to squeeze down the time it takes to read it

 

maybe downloading the file locally, might help, but if you are interested in dynamic data, than it might not be what you are looking for.
Its already downloade locally (and its also an option in the READER settings to keep it locally).
Hi,

 

 

unfortunately, I don't have any tricks regarding such huge geojson files, but here is an alternative strategy if speed is crucial:

 

 

Read the file inside a PythonCreator using the module geojson to serialize the features into FMEFeature objects. I'd be surprised if that wasn't a fair bit quicker.

 

 

Lykke til

 

 

David
A geojason 2 ffs conversion and using the ffs?
Itay: Thats cheating doing the benchnmark test which is GeoJSON -> SQLite.

 

 

It takes 12min to read the GeoJSON with 64bit, 10GB RAM, SSD, and 4 minutes to write the SQLite. NOT a long time, however just wanted to see if it was possible to cut it even more. 
ha 🙂 your just testing......4 min is not a lot of time, seen worse cases.
Agree with Itay, 12 minutes to serialize 500MB of text into the internal feature representations is actually quite impressive.

 

 

Am curious about the reasoning behind these tests.

 

 

David
My guess is: the unfortunate human wanting for faster and more..... 🙂
Just comparing to Arc-software and other opensourcetools - to brand FME as the fastest 🙂
Give me a word if you want some competition from a pure Python solution using the geojson and sqlite3 modules ;-)

 

 

But if user friendly enters as a parameter to the tests, there is no question that FME will win hands-down, regardless!

 

 

David
Since we are in competetive mode, have a look at this thread with comments regarding ArcPy and Dissolve. 4.1 seconds with FME 🙂 :

 

 

http://www.mindland.com/wp/solving-the-arcpy-dissolve/

 

 


Yeah, I saw that post, very interesting.

 

 

But comparing 4.1 (FME) vs 4.5 (Python shapely) vs 1.5 (JEQL) seconds is a bit moot when everybody is sitting on wildly different hardware ;-)

 

 

Still, fascinating discussion.

 

 

David

Reply