Question

How to debug Matcher vs DuplicateFilter vs FeatureMerger = different answers!

  • 10 November 2016
  • 2 replies
  • 12 views

Badge +7

Again, not really a question, more something that might help those trying to understand different results from these transformers.

Two source File Geodatabases with identical schemas.

2152 features in one, 148 in the other.

FeatureMerger (merge Attributes and Geometry) using the unique ID field for the data (not ObjectID) = 93 Merged (exists in both), 2059 NotMerged (only in Requestor), 55 Unreferenced (only in Supplier), plus 93 Referenced which as I understand it and appears so from Inspector, should be the same features that come out of the merged port.

Merged, NotMerged and Unreferenced are sent to Sorter then DuplicateFilter and Matcher, neither of which find any duplicates which is what I was hoping. This results in 2207 unique features.

However, I also sent Merged and Referenced to Matcher which says that 186 (93+93) were NotMatched! I used Match Selected Attributes and ticked everything except Object ID and numReferences. You might ask why? Changing Matched to NOT differentiate between Empty, Missing and Null makes no difference, nor does Lenient Geometry Matching. I know Matcher compares geometry and the other two only compare attributes but I compared some features in Inspector and couldn't see any differences in field types, values or geometry.

To confuse things even more, I sent the two source datasets straight to Matcher. This time, 182 were Matched (so 91 SingleMatched) and 2118 were NotMatched. This results in 2209 unique features.

I also sent the two source datasets direct to Sorter then DuplicateFilter and got 2207 unique features.

My next move was to send the SingleMatched (91) and NotMatched (2118) features via Sorter to DuplicateFilter. This gave me the 2 features that made the difference between 2207 and 2209. For simplicity let's say these have unique IDs of UID1 and UID2.

I then applied a Tester to the Unique output port of the DuplicateFilter to get the other copies of those 2 records (Passed = where unique ID In UID1,UID2) and sent the results (4 features to Inspector).

I still couldn't see any differences so I copied the Feature Information for each feature out of Inspector into separate text files in TextPad (any text editor that does good file compare will do).

This revealed that a couple of fields were utf-16 with a value of <null> in one feature and utf-16e with a blank or empty value in the other, hence NotMatched. Mystery solved! I now know that unless I'm bothered about the difference between <null> and blank/empty I can take the 2207 features output from FeatureMerger (Merged + NotMerged + Unreferenced) as the unique features from the 2 source datasets.

I've still no idea why my other Matcher output 186 NotMatched rather than 93 Matched but it's not so important now I've reconciled the different answers between FeatureMerger, DuplicateFilter and Matcher.


2 replies

Userlevel 2
Badge +17

Hi @tim_wood, regarding the first question: "However, I also sent Merged and Referenced to Matcher which says that 186 (93+93) were NotMatched! I used Match Selected Attributes and ticked everything except Object ID and numReferences."

A possible reason I can think of is that you have ticked an exposed format attribute such as fme_feature_type. Naturally its value could be different between the Merged feature (from Requestor) and the Referenced feature (from Supplier).

Badge +7

Hi @tim_wood, regarding the first question: "However, I also sent Merged and Referenced to Matcher which says that 186 (93+93) were NotMatched! I used Match Selected Attributes and ticked everything except Object ID and numReferences."

A possible reason I can think of is that you have ticked an exposed format attribute such as fme_feature_type. Naturally its value could be different between the Merged feature (from Requestor) and the Referenced feature (from Supplier).

No Format attributes exposed in Readers. I tried using an AttributeRemover before the Matcher to get rid of ObjectID and numReferences but still the same result.

 

I guess it must be something between Merged and Referenced as you say. Maybe something like the utf-16/utf-16e difference.

 

Update: I did the TextPad comparison on the first 2 features and found that one was an fme_line which is down to the Geometry Merge Type I selected in FeatureMerger and the fact I selected Merge Geometry and Attributes. The coordinates appear to be the same so I don't think the feature has moved.

 

Comparing to FeatureMerger in another Workspace, this one had Process Duplicate Suppliers set to Yes whereas in the other Workspace it was set to No. I set it to No in this Workspace (zero features had come out the Duplicate Suppliers port anyway) and hey presto, the Matcher now produces the expected result.

 

From this I deduce that if Process Duplicate Suppliers is set to Yes and therefore the Geometry Merge Type is active, features that exist in Requestor and Supplier will have their point geometry turned into a line (if that's what you choose) even if the coordinates of the input points are identical.

 

Reply