Skip to main content
Hi

 

I need to find coincident points from different geodatabase database and assign a unique id to those features

 

 

Facts:
  1. I have 2 database (A and 😎 with same feature classes.
  2. Database A is a subset of Database B from a content perspective
  3. From schema perspective Database A and B are same
  4. So if Featureclass BXX will have 2000 features then Featureclass AXX would probably have 1800 features
  5. I need to assign all the 1800 features from Featureclass AXX and BXX same incremental id
  6. For e.g first feature for coincident feature in FC AXX and BXX will get id 101. Second coincident feature from both FC will get ID of 102 and so on.
  7. For the remaining 200 features from BXX will also get a unique id but there won't be anything assigned in feature class AXX as there are not features
  8. Most (not all) of the attributes are same on both these feature class.
  9. I also have a lat and long field in addition to shape field - they should match as well.

 

The challenge is in finding the coincident features in a fashion that is not so memory intensive. We are talking about around 800,000 points in Database A and 600,000 points in database B.

 

 

I have tried using matcher, spatial relator and spatial finder. Spatial Relator and spatial finder is not able to process so much data.  Perhaps I am doing something wrong. It appered matcher was able to process data but I am not sure if that is the right tool.

 

 

Please help 🙂.

 

 

 
Hi

 

 

One strategy could be to first split your data into more manageable chunks. You can use a Tiler (use the same seed coordinate for both datasets) and a PointOnAreaOverlayer to assign a tile id to each point. You can then fanout the points to different feature classes depending on the tile id.

 

 

Finally, create a master workspace that iterates over the tiles and calls a second workspace that loads and compares the two datasets using the specified tile id.

 

 

Depending on the size of your tiles, this should be much less memory intensive.

 

 

David

 

 
Alternatively,

 

1. Assign unique ID in both DB's (ignore it is already present)

 

2. Since you have 9th point says that lat-long should match so create an attribute with Lat-long (like Lat,Long)

 

2. Use feature merger and merge the data based on attribute created in step-2 (9th condition)

 

3. At this stage you will get like A001 record may be combined ID of B203 (Use only ID,Lat-long fields during tralation to execute the workbench faster)

 

4. Now you will know which records of A-DB and B-DB should be same

 

5. We can rename the B-User ID value with A-user

 

 

Hope it works

 

 

Pratap
Hi,

 

 

I agree that the tiling strategy can be applied in general.

 

 

Another thought.

 

I think the SpatialFilter may also be a workaround, if the condition allows you to set "Filter First" to its "Filter Type" parameter. Since the Matcher will store all input features, it may consume huge memory as you mentioned. The SpatialFilter (Filter Type: Filters First) will also store the Filter features, but it will not store the Candidate features. This could reduce memory usage than the Matcher. In my experiences, there was a case the SpatialFilter was able to process a million Filters vs. a million Candidates efficiently with the "Filters First" mode. I think it's also worth to try.

 

 

Takashi
Thank you David, Pratap and Takashi. 

 

I started off with Takashi's suggestion as that offered path of least resistance. 

 

 

You are correct - I was able to go through all the features with Filter First. However, spatial filter only gave me info regarding target as opposed to giving info about source AND target - Note that I need to assign same ID to both the features. 

 

 

I ended up going back to spatial Relator and removed a lot of (sort of) unnecessary fields so that I am not holding too much memory. 

 

 

It appears to be working - except the spatialrelator only provides output of related field. If I also want to unique assign IDs th the ones that are not related then I will have to probably re-read all of it and reassign the IDs. 

 

 

Takashi - Do you have any thoughts on how to make it work with SpatialFilter? 
If the ID requires just to be unique, add unique ID to B features at first. Then send B features to the Filter port, and send A features to the Candidate port. Result, every A feature will have the same ID as the matched B feature, since A feature class is a subset of B feature class.

 

However, if there are other conditions about ID values other than "unique", another way has to be considered. 

Reply