Skip to main content
Question

How to find the closest point using SQLExecutor?


dataninja
Forum|alt.badge.img

Hi everyone,

For each record in the CSV file, I want to find the closest point to it from a SpatiaLite database using SQLExecutor. What's the most efficient query that I can use to tackle this task because my SpatialLite DB contains over 35 mil points? Thanks in advance!

 

3 replies

erik_jan
Contributor
Forum|alt.badge.img+17
  • Contributor
  • October 11, 2019

If the CSV file has a small amount of points, I would use a Bufferer to create small areas (that you believe should hold at least 1 point from the Spatialite database).

Then use the Buffers to select points from the Spatialite database ,using the FeatureReader (with spatial query).

Then feed both the original CSV points and the returned points from the Spatialite database in a NeighborFinder transformer to find only the closest point.

Hope this helps.


dataninja
Forum|alt.badge.img
  • Author
  • October 11, 2019
erik_jan wrote:

If the CSV file has a small amount of points, I would use a Bufferer to create small areas (that you believe should hold at least 1 point from the Spatialite database).

Then use the Buffers to select points from the Spatialite database ,using the FeatureReader (with spatial query).

Then feed both the original CSV points and the returned points from the Spatialite database in a NeighborFinder transformer to find only the closest point.

Hope this helps.

That's exactly how I do it right now. I just wanna try SQLExecutor to see if it's faster or not.


steveatsafe
Safer
Forum|alt.badge.img+12
  • Safer
  • October 11, 2019

I don't play with SpatiaLite much... but if I treat it like other relational databases that support spatial queries...

I would make the query in SQLExecutor compare each point in the CSV with your dataset in SpatiaLite using something like this for SpatiaLite KNN:

https://www.gaia-gis.it/fossil/libspatialite/wiki?name=KNN

You may also find loading the CSV into the SpatiaLite database might improve the process...

I've got to be honest though... I'm really just throwing this out as an idea and approach to the problem that I spent only a few minutes researching what could be done. By no means is this the only way to solve this or the best way. It would be one way I might attempt to tackle it.

As @erik_jan has pointed out, that would also seem like a valid approach and working with fewer data points.

Either way, 35,000,000 is a lot of rows... patience will be needed when working with this sort of database.


Cookie policy

We use cookies to enhance and personalize your experience. If you accept you agree to our full cookie policy. Learn more about our cookies.

 
Cookie settings