Question

FME 2018.0 FeatureJoiner: Why assign Null to Missing Attributes with Outer Join Mode?


Userlevel 2
Badge +17

I don't know if it's reasonable the FeatureJoiner will assign the <null> to missing attributes with an outer join mode (Left or Full).


14 replies

Userlevel 3
Badge +13

Very good point. I know this was discussed but I've raised it with the team and will report back.

 

 

It feels like, at first blush, Missing is the traditional FME way to handle this. BUt I wonder if we were trying to mimic database behaviour more completely.
Userlevel 2
Badge +17

Very good point. I know this was discussed but I've raised it with the team and will report back.

 

 

It feels like, at first blush, Missing is the traditional FME way to handle this. BUt I wonder if we were trying to mimic database behaviour more completely.
Yes, I can imagine that you were trying to mimic database behavior.

 

However, in my understanding, we have distinguished between "an attribute does exist and has the Null value" and "an attribute doesn't exist (is missing)" since 2014, and many functions have been implemented on the premise that those are different. As far as I know, FME had never performed conversion between <missing> and <null> implicitly, except where FME writes a feature containing missing attributes into a destination dataset that supports NULL. I therefore feel something inconsistent here, even if this would not cause any critical problem.

 

Userlevel 3
Badge +13

Hi Takashi, here's what the team says:

 

 

This was discussed during implementation, where it was agreed to use <null> instead of <missing>.The arguments at that time were: 1. Using <null> is more SQL-like. 2. Producing <missing> values would mean there is very little difference between an Inner and Left/Full join, other than the port that features come out of. For example, the difference between an Inner and Left join would be that in an Inner join the unmatched features from the left input come out the UnjoinedLeft port, whereas in a Left join they come out the Joined port. In this case, I would argue we could just remove the join type option altogether, and let users make their own join by routing the Joined/UnjoinedLeft/UnjoinedRight features. As it is, there is actually a practical difference between an Inner and Left/Full join. 3. It is possibly slightly more difficult to go from <missing> to <null> (since you need to know exactly what attributes are missing), than it is from <null> to <missing> since you can use the NullValueMapperSo that was the thinking behind what we did.
Userlevel 2
Badge +17

Thanks for detailed explanation. I'm still feeling something inconsistent but would respect your decision.

By the way, probably I found a critical bug in the FeatureJoiner. See the attached workspace.

b18284-featurejoiner-full-join.fmw (FME 2018.0 build 18284)

With the Full join mode, if you leave the Attribute Conflict Resolution parameter default (i.e. Use Left), the result seems to be correct.

However, if you set Use Right to the Attribute Conflict Resolution parameter, most attributes of the Left features including format attributes will be null. I don't think it's intentional.

There could be other issues than this. I think you will have to perform a comprehensive test on the FeatureJoiner more carefully.

Userlevel 2
Badge +17

Thanks for detailed explanation. I'm still feeling something inconsistent but would respect your decision.

By the way, probably I found a critical bug in the FeatureJoiner. See the attached workspace.

b18284-featurejoiner-full-join.fmw (FME 2018.0 build 18284)

With the Full join mode, if you leave the Attribute Conflict Resolution parameter default (i.e. Use Left), the result seems to be correct.

However, if you set Use Right to the Attribute Conflict Resolution parameter, most attributes of the Left features including format attributes will be null. I don't think it's intentional.

There could be other issues than this. I think you will have to perform a comprehensive test on the FeatureJoiner more carefully.

If you send an additional feature that matches a Left feature to the Right port, you will get more strange result.

 

Userlevel 2
Badge +14

Hi @takashi PR82923 has now been resolved and the fix is available in 2018.1 builds 18480+

After discussion across various teams the FeatureJoiner will now preserve key attributes and set non-key attributes to <null>. This is moving away from the initial mirroring SQL approach but should be more usable within FME.

Under this new scheme, the correct values for the example should be:

Use Left

---------

id empty null missing

0 "" <null> x

1 "" <null> <missing>

2 "" <null> <missing>

3 <null> <null> <null>

4 <null> <null> <null>

5 <null> <null> <null>

Use Right --------- id empty null missing 0 <null> <null> x 1 <null> <null> <missing> 2 <null> <null> <missing> 3 "" <null> <null> 4 "" <null> <null> 5 "" <null> <null>
Userlevel 2
Badge +17
Hi @lizsanderson, thanks for the update on the PR.

 

I'm still feeling something strange...

 

If it was the FULL OUTER JOIN operation in a database, not matched records won't be changed at all. Right?

 

In this case, with the Full join mode, I thought all records would be preserved without any change, regardless of Attribute Conflict Resolution setting, since no records matched between the tables.

 

I cannot find any reasonable explanation on the reason why the column values in the not-matched records have changed with the Full join mode.

 

I think that every record belonging to the blue area in this screenshot should not be changed with the full join operation, regardless of whether the two tables intersect on a join condition (upper) or not (lower). Then, the Attribute Conflict Resolution parameter should affect only the records belonging to the orange area. Am I missing something?

Userlevel 2
Badge +17

Hi @takashi PR82923 has now been resolved and the fix is available in 2018.1 builds 18480+

After discussion across various teams the FeatureJoiner will now preserve key attributes and set non-key attributes to <null>. This is moving away from the initial mirroring SQL approach but should be more usable within FME.

Under this new scheme, the correct values for the example should be:

Use Left

---------

id empty null missing

0 "" <null> x

1 "" <null> <missing>

2 "" <null> <missing>

3 <null> <null> <null>

4 <null> <null> <null>

5 <null> <null> <null>

Use Right --------- id empty null missing 0 <null> <null> x 1 <null> <null> <missing> 2 <null> <null> <missing> 3 "" <null> <null> 4 "" <null> <null> 5 "" <null> <null>
Hi @lizsanderson, thanks for the update on the PR.

 

I'm still feeling something strange...

 

If it was the FULL OUTER JOIN operation in a database, not matched records won't be changed at all. Right?

 

In this case, with the Full join mode, I thought all records would be preserved without any change, regardless of Attribute Conflict Resolution setting, since no records matched between the tables.

 

I cannot find any reasonable explanation on the reason why the column values in the not-matched records have changed with the Full join mode.

 

I think that every record belonging to the blue area in this screenshot should not be changed with the full join operation, regardless of whether the two tables intersect on a join condition (upper) or not (lower). Then, the Attribute Conflict Resolution parameter should affect only the records belonging to the orange area. Am I missing something?

Userlevel 2
Badge +17
Hi @lizsanderson, thanks for the update on the PR.

 

I'm still feeling something strange...

 

If it was the FULL OUTER JOIN operation in a database, not matched records won't be changed at all. Right?

 

In this case, with the Full join mode, I thought all records would be preserved without any change, regardless of Attribute Conflict Resolution setting, since no records matched between the tables.

 

I cannot find any reasonable explanation on the reason why the column values in the not-matched records have changed with the Full join mode.

 

I think that every record belonging to the blue area in this screenshot should not be changed with the full join operation, regardless of whether the two tables intersect on a join condition (upper) or not (lower). Then, the Attribute Conflict Resolution parameter should affect only the records belonging to the orange area. Am I missing something?

This table is from the help on the FeatureJoiner - 2018.1.

 

Can the user imagine that the unmatched Left and Right features could be changed with the Full join mode?

 

Userlevel 2
Badge +17
Hi @lizsanderson, thanks for the update on the PR.

 

I'm still feeling something strange...

 

If it was the FULL OUTER JOIN operation in a database, not matched records won't be changed at all. Right?

 

In this case, with the Full join mode, I thought all records would be preserved without any change, regardless of Attribute Conflict Resolution setting, since no records matched between the tables.

 

I cannot find any reasonable explanation on the reason why the column values in the not-matched records have changed with the Full join mode.

 

I think that every record belonging to the blue area in this screenshot should not be changed with the full join operation, regardless of whether the two tables intersect on a join condition (upper) or not (lower). Then, the Attribute Conflict Resolution parameter should affect only the records belonging to the orange area. Am I missing something?

 

Another example. Is it intentional that the values in column "b" will be replaced with <null>?

 

FME 2018.1 beta build 18496 on Mac OS X

 

Attribute Conflict Resolution: Use Left

 

Attribute Conflict Resolution: Use Right

 

Userlevel 2
Badge +17
Hi @lizsanderson, thanks for the update on the PR.

 

I'm still feeling something strange...

 

If it was the FULL OUTER JOIN operation in a database, not matched records won't be changed at all. Right?

 

In this case, with the Full join mode, I thought all records would be preserved without any change, regardless of Attribute Conflict Resolution setting, since no records matched between the tables.

 

I cannot find any reasonable explanation on the reason why the column values in the not-matched records have changed with the Full join mode.

 

I think that every record belonging to the blue area in this screenshot should not be changed with the full join operation, regardless of whether the two tables intersect on a join condition (upper) or not (lower). Then, the Attribute Conflict Resolution parameter should affect only the records belonging to the orange area. Am I missing something?

I got confused...

 

FeatureJointer: Left Join (FME 2018.1 beta build 18052)

 

 

Attribute Conflict Resolution: Use Left => It seems to be correct.

 

 

Attribute Conflict Resolution: Use Right => Why have the values of field "b" in unmatched records from the left table been replaced with <null>?

 

 

 

Userlevel 2
Badge +17
It's still weird :-(

 

b18496-featurejoiner-outer-join-for-list-attribute.fmw (FME 2018.1 beta build 18496, Max OS X)

 

 

Test 1: Inner Join, Use Left

 

 

Test 2: Inner Join, Use Right

 

 

Test 3: Left Join, Use Left

 

Test 4: Left Join, Use Right

 

 

Test 5: Full Jion, Use Left

 

Test 6: Full Join, Use Right

 

Additionally please test swapping Table 1 and Table 2.
Userlevel 3
Badge +13
It's still weird :-(

 

b18496-featurejoiner-outer-join-for-list-attribute.fmw (FME 2018.1 beta build 18496, Max OS X)

 

 

Test 1: Inner Join, Use Left

 

 

Test 2: Inner Join, Use Right

 

 

Test 3: Left Join, Use Left

 

Test 4: Left Join, Use Right

 

 

Test 5: Full Jion, Use Left

 

Test 6: Full Join, Use Right

 

Additionally please test swapping Table 1 and Table 2.
Thanks for continuing to dig into this one Takashi. We had a meeting today and went through your findings. While things are indeed working as we'd designed them, and the theory behind them is sound, we do agree that they are violating the "principle of least surprise". We're going to dig into designing some additional options that provide more natural (if theoretically less pure) behaviours. Really really appreciate your insights here.

 

 

Userlevel 2
Badge +17
Thanks for continuing to dig into this one Takashi. We had a meeting today and went through your findings. While things are indeed working as we'd designed them, and the theory behind them is sound, we do agree that they are violating the "principle of least surprise". We're going to dig into designing some additional options that provide more natural (if theoretically less pure) behaviours. Really really appreciate your insights here.

 

 

Thanks for your consideration. I just hope FME follows the "principle of least surprise".

 

 

Reply