Hi
One strategy could be to first split your data into more manageable chunks. You can use a Tiler (use the same seed coordinate for both datasets) and a PointOnAreaOverlayer to assign a tile id to each point. You can then fanout the points to different feature classes depending on the tile id.
Finally, create a master workspace that iterates over the tiles and calls a second workspace that loads and compares the two datasets using the specified tile id.
Depending on the size of your tiles, this should be much less memory intensive.
David
Alternatively,
1. Assign unique ID in both DB's (ignore it is already present)
2. Since you have 9th point says that lat-long should match so create an attribute with Lat-long (like Lat,Long)
2. Use feature merger and merge the data based on attribute created in step-2 (9th condition)
3. At this stage you will get like A001 record may be combined ID of B203 (Use only ID,Lat-long fields during tralation to execute the workbench faster)
4. Now you will know which records of A-DB and B-DB should be same
5. We can rename the B-User ID value with A-user
Hope it works
Pratap
Hi,
I agree that the tiling strategy can be applied in general.
Another thought.
I think the SpatialFilter may also be a workaround, if the condition allows you to set "Filter First" to its "Filter Type" parameter. Since the Matcher will store all input features, it may consume huge memory as you mentioned. The SpatialFilter (Filter Type: Filters First) will also store the Filter features, but it will not store the Candidate features. This could reduce memory usage than the Matcher. In my experiences, there was a case the SpatialFilter was able to process a million Filters vs. a million Candidates efficiently with the "Filters First" mode. I think it's also worth to try.
Takashi
Thank you David, Pratap and Takashi.
I started off with Takashi's suggestion as that offered path of least resistance.
You are correct - I was able to go through all the features with Filter First. However, spatial filter only gave me info regarding target as opposed to giving info about source AND target - Note that I need to assign same ID to both the features.
I ended up going back to spatial Relator and removed a lot of (sort of) unnecessary fields so that I am not holding too much memory.
It appears to be working - except the spatialrelator only provides output of related field. If I also want to unique assign IDs th the ones that are not related then I will have to probably re-read all of it and reassign the IDs.
Takashi - Do you have any thoughts on how to make it work with SpatialFilter?
If the ID requires just to be unique, add unique ID to B features at first. Then send B features to the Filter port, and send A features to the Candidate port. Result, every A feature will have the same ID as the matched B feature, since A feature class is a subset of B feature class.
However, if there are other conditions about ID values other than "unique", another way has to be considered.