I don't know if it's reasonable the FeatureJoiner will assign the <null> to missing attributes with an outer join mode (Left or Full).
This post is closed to further activity.
It may be an old question, an answered question, an implemented idea, or a notification-only post.
Please check post dates before relying on any information in a question or answer.
For follow-up or related questions, please post a new question or idea.
If there is a genuine update to be made, please contact us and request that the post is reopened.
Very good point. I know this was discussed but I've raised it with the team and will report back.
It feels like, at first blush, Missing is the traditional FME way to handle this. BUt I wonder if we were trying to mimic database behaviour more completely.
Very good point. I know this was discussed but I've raised it with the team and will report back.
It feels like, at first blush, Missing is the traditional FME way to handle this. BUt I wonder if we were trying to mimic database behaviour more completely.
Yes, I can imagine that you were trying to mimic database behavior.
However, in my understanding, we have distinguished between "an attribute does exist and has the Null value" and "an attribute doesn't exist (is missing)" since 2014, and many functions have been implemented on the premise that those are different. As far as I know, FME had never performed conversion between <missing> and <null> implicitly, except where FME writes a feature containing missing attributes into a destination dataset that supports NULL. I therefore feel something inconsistent here, even if this would not cause any critical problem.
This was discussed during implementation, where it was agreed to use <null> instead of <missing>.The arguments at that time were:
1. Using <null> is more SQL-like.
2. Producing <missing> values would mean there is very little difference between an Inner and Left/Full join, other than the port that features come out of. For example, the difference between an Inner and Left join would be that in an Inner join the unmatched features from the left input come out the UnjoinedLeft port, whereas in a Left join they come out the Joined port. In this case, I would argue we could just remove the join type option altogether, and let users make their own join by routing the Joined/UnjoinedLeft/UnjoinedRight features. As it is, there is actually a practical difference between an Inner and Left/Full join.
3. It is possibly slightly more difficult to go from <missing> to <null> (since you need to know exactly what attributes are missing), than it is from <null> to <missing> since you can use the NullValueMapperSo that was the thinking behind what we did.
With the Full join mode, if you leave the Attribute Conflict Resolution parameter default (i.e. Use Left), the result seems to be correct.
However, if you set Use Right to the Attribute Conflict Resolution parameter, most attributes of the Left features including format attributes will be null. I don't think it's intentional.
There could be other issues than this. I think you will have to perform a comprehensive test on the FeatureJoiner more carefully.
With the Full join mode, if you leave the Attribute Conflict Resolution parameter default (i.e. Use Left), the result seems to be correct.
However, if you set Use Right to the Attribute Conflict Resolution parameter, most attributes of the Left features including format attributes will be null. I don't think it's intentional.
There could be other issues than this. I think you will have to perform a comprehensive test on the FeatureJoiner more carefully.
If you send an additional feature that matches a Left feature to the Right port, you will get more strange result.
Hi @takashi PR82923 has now been resolved and the fix is available in 2018.1 builds 18480+
After discussion across various teams the FeatureJoiner will now preserve key attributes and set non-key attributes to <null>. This is moving away from the initial mirroring SQL approach but should be more usable within FME.
Under this new scheme, the correct values for the example should be:
Use Left
---------
id empty null missing
0 "" <null> x
1 "" <null> <missing>
2 "" <null> <missing>
3 <null> <null> <null>
4 <null> <null> <null>
5 <null> <null> <null>
Use Right
---------
id empty null missing
0 <null> <null> x
1 <null> <null> <missing>
2 <null> <null> <missing>
3 "" <null> <null>
4 "" <null> <null>
5 "" <null> <null>
Hi @takashi PR82923 has now been resolved and the fix is available in 2018.1 builds 18480+
After discussion across various teams the FeatureJoiner will now preserve key attributes and set non-key attributes to <null>. This is moving away from the initial mirroring SQL approach but should be more usable within FME.
Under this new scheme, the correct values for the example should be:
Use Left
---------
id empty null missing
0 "" <null> x
1 "" <null> <missing>
2 "" <null> <missing>
3 <null> <null> <null>
4 <null> <null> <null>
5 <null> <null> <null>
Use Right
---------
id empty null missing
0 <null> <null> x
1 <null> <null> <missing>
2 <null> <null> <missing>
3 "" <null> <null>
4 "" <null> <null>
5 "" <null> <null>
Hi @lizsanderson, thanks for the update on the PR.
I'm still feeling something strange...
If it was the FULL OUTER JOIN operation in a database, not matched records won't be changed at all. Right?
In this case, with the Full join mode, I thought all records would be preserved without any change, regardless of Attribute Conflict Resolution setting, since no records matched between the tables.
I cannot find any reasonable explanation on the reason why the column values in the not-matched records have changed with the Full join mode.
I think that every record belonging to the blue area in this screenshot should not be changed with the full join operation, regardless of whether the two tables intersect on a join condition (upper) or not (lower). Then, the Attribute Conflict Resolution parameter should affect only the records belonging to the orange area. Am I missing something?
Hi @lizsanderson, thanks for the update on the PR.
I'm still feeling something strange...
If it was the FULL OUTER JOIN operation in a database, not matched records won't be changed at all. Right?
In this case, with the Full join mode, I thought all records would be preserved without any change, regardless of Attribute Conflict Resolution setting, since no records matched between the tables.
I cannot find any reasonable explanation on the reason why the column values in the not-matched records have changed with the Full join mode.
I think that every record belonging to the blue area in this screenshot should not be changed with the full join operation, regardless of whether the two tables intersect on a join condition (upper) or not (lower). Then, the Attribute Conflict Resolution parameter should affect only the records belonging to the orange area. Am I missing something?
Hi @lizsanderson, thanks for the update on the PR.
I'm still feeling something strange...
If it was the FULL OUTER JOIN operation in a database, not matched records won't be changed at all. Right?
In this case, with the Full join mode, I thought all records would be preserved without any change, regardless of Attribute Conflict Resolution setting, since no records matched between the tables.
I cannot find any reasonable explanation on the reason why the column values in the not-matched records have changed with the Full join mode.
I think that every record belonging to the blue area in this screenshot should not be changed with the full join operation, regardless of whether the two tables intersect on a join condition (upper) or not (lower). Then, the Attribute Conflict Resolution parameter should affect only the records belonging to the orange area. Am I missing something?
Hi @lizsanderson, thanks for the update on the PR.
I'm still feeling something strange...
If it was the FULL OUTER JOIN operation in a database, not matched records won't be changed at all. Right?
In this case, with the Full join mode, I thought all records would be preserved without any change, regardless of Attribute Conflict Resolution setting, since no records matched between the tables.
I cannot find any reasonable explanation on the reason why the column values in the not-matched records have changed with the Full join mode.
I think that every record belonging to the blue area in this screenshot should not be changed with the full join operation, regardless of whether the two tables intersect on a join condition (upper) or not (lower). Then, the Attribute Conflict Resolution parameter should affect only the records belonging to the orange area. Am I missing something?
Another example. Is it intentional that the values in column "b" will be replaced with <null>?
Hi @lizsanderson, thanks for the update on the PR.
I'm still feeling something strange...
If it was the FULL OUTER JOIN operation in a database, not matched records won't be changed at all. Right?
In this case, with the Full join mode, I thought all records would be preserved without any change, regardless of Attribute Conflict Resolution setting, since no records matched between the tables.
I cannot find any reasonable explanation on the reason why the column values in the not-matched records have changed with the Full join mode.
I think that every record belonging to the blue area in this screenshot should not be changed with the full join operation, regardless of whether the two tables intersect on a join condition (upper) or not (lower). Then, the Attribute Conflict Resolution parameter should affect only the records belonging to the orange area. Am I missing something?
I got confused...
FeatureJointer: Left Join (FME 2018.1 beta build 18052)
Attribute Conflict Resolution: Use Left => It seems to be correct.
Attribute Conflict Resolution: Use Right => Why have the values of field "b" in unmatched records from the left table been replaced with <null>?
Additionally please test swapping Table 1 and Table 2.
Thanks for continuing to dig into this one Takashi. We had a meeting today and went through your findings. While things are indeed working as we'd designed them, and the theory behind them is sound, we do agree that they are violating the "principle of least surprise". We're going to dig into designing some additional options that provide more natural (if theoretically less pure) behaviours. Really really appreciate your insights here.
Thanks for continuing to dig into this one Takashi. We had a meeting today and went through your findings. While things are indeed working as we'd designed them, and the theory behind them is sound, we do agree that they are violating the "principle of least surprise". We're going to dig into designing some additional options that provide more natural (if theoretically less pure) behaviours. Really really appreciate your insights here.
Thanks for your consideration. I just hope FME follows the "principle of least surprise".