Question

Retain original attribute value over revised null value

7 years ago
February 12, 2018
10 replies
41 views

cj
19 replies

I am using the UpdateDetector to update GDB_A.

GDB_A is entering the UpdateDetector through the Original Port.

GDB_B is an updated version of the same data and enters the UpdateDecector through the Revised Port.

GDB_B more often than not includes all of GDB_A plus some new features, these correctly exit through the Insert Port.

GDB_A has had many changes made to its attributes since the last update (basically attribute data has been entered in manually replacing Null values with more useful data)

As these manually added values do not exist in GDB_B, any rows that have been manually adjusted in GDB_A exit the UpdateDetector through the Update Port, with the manually added data in GDB_A having been replaced with the Null value from GDB_B.

I am looking for an elegant way to prioritize any value over a null value for every attribute, regardless of whether the value is sourced from the Original (GDB_A) or the Revised (GDB_B) dataset. But also if there is a true update; ie both GDB_A and GDB_B have values and they differ, then it should exit via the Update Port with the GDB_B value.

Thanks in advance.

takashi
7715 replies
7 years ago
February 12, 2018

Hi @cj, assuming A and B have a common ID attribute, a possible way is:

Remove all <null> attributes from B features with the NullAttributeMapper (If Attribute Value Is: Null, Map To: Missing).
Transfer values that have been manually set to the original <null> fields from A features to B features with the FeatureMerger, using the ID as join key.

FeatureMerger Setting: Send B features to the Requestor port, send A features to the Supplier port, and set ID to the Join On parameter for both Requestor and Supplier. With this setting, if an attribute of B feature is <missing> and the A feature corresponding to the B feature has a value in the same attribute field, the attribute value will be merged from A to B.

cj
Author
19 replies
7 years ago
February 13, 2018

takashi wrote:

Hi @cj, assuming A and B have a common ID attribute, a possible way is:

Remove all <null> attributes from B features with the NullAttributeMapper (If Attribute Value Is: Null, Map To: Missing).
Transfer values that have been manually set to the original <null> fields from A features to B features with the FeatureMerger, using the ID as join key.

A and B do not have a common reliable ID attribute. Could I use the CRCCalculator to generate a unique ID though? Would you use the FeatureMerger before sending data to UpdateDetector? So sending the result from the Merged Port to the Revised port and GDB_A straight to the Original Port?

takashi
7715 replies
7 years ago
February 13, 2018

takashi wrote:

Hi @cj, assuming A and B have a common ID attribute, a possible way is:

Remove all <null> attributes from B features with the NullAttributeMapper (If Attribute Value Is: Null, Map To: Missing).
Transfer values that have been manually set to the original <null> fields from A features to B features with the FeatureMerger, using the ID as join key.

I assume that A and B have a common ID attribute in order to compare original and revised features, since you are going to use the UpdateDetector which requires Key Attribute parameter. I mean that Key Attribute is a common ID attribute.

My intention is to apply the NullAttributeMapper and the FeatureMerger before the UpdateDetector, of course.

Send GDB_A to the Original port of the UpdateDetector.
Send the features output from the Merged port and the UmmergedRequestor port of the FeatureMerger to the Revised port together.

takashi
7715 replies
7 years ago
February 13, 2018

takashi wrote:

Hi @cj, assuming A and B have a common ID attribute, a possible way is:

Remove all <null> attributes from B features with the NullAttributeMapper (If Attribute Value Is: Null, Map To: Missing).
Transfer values that have been manually set to the original <null> fields from A features to B features with the FeatureMerger, using the ID as join key.

This mock-up workflow illustrates my intention.

cj
Author
19 replies
7 years ago
February 14, 2018

takashi wrote:

This mock-up workflow illustrates my intention.

It was only since posting the issue that I discovered that what I thought was a unique ID was in fact not unique. I think using a combination of fields that I believe have not been manually adjusted to create a CRC value that I could then use as a unique ID attribute in the UpdateDetector in your workflow above could be a way forward.

cj
Author
19 replies
7 years ago
February 16, 2018

takashi wrote:

This mock-up workflow illustrates my intention.

Have used a CRC value as a unique ID. Setup as you have it above the UpdateDetector does not output the correct values. The correct values are output from the merged port, that being the data from B merged with the added attributes from A, those then should be output through the Updated port on the UpdateDetector, however I am finding they are not output anywhere? If I connect all ports, except the rejected port, to the UpdateDetector Revised port then do I get a correct result. Would this be expected?

takashi
7715 replies
7 years ago
February 16, 2018

takashi wrote:

This mock-up workflow illustrates my intention.

It sounds that a join key for A and B consists of two or more attributes. If so, you can set those attributes to the Key Attribute parameter in the UpdateDetector, also use them as Join On parameter in the FeatureMerger in the workflow above. I don't think CRC is needed.

cj
Author
19 replies
7 years ago
February 21, 2018

takashi wrote:

The CRC is required as it also takes into account the coordinates. In some cases the coordinates are the only thing that can be trusted to be unique.

I am still having to send all four result ports from the FeatureMerger to the Revised port of the UpdateDetector to get a correct result.

takashi
7715 replies
7 years ago
February 21, 2018

takashi wrote:

If it's sure that every B feature has the same geometry as the original A feature, CRC (calculated based on only geometry) can be used as unique ID.

However, I cannot understand why you could get a correct result by sending the features from all four ports into the Revised port.

cj
Author
19 replies
7 years ago
April 16, 2018

takashi wrote:

If it's sure that every B feature has the same geometry as the original A feature, CRC (calculated based on only geometry) can be used as unique ID.

However, I cannot understand why you could get a correct result by sending the features from all four ports into the Revised port.

Finally getting a chance to look back at this! Thanks for your help so far! After some testing the way you have it setup above is working. However, I am finding that the UpdateDetector is very slow, it took just over 2 hours to process only 1000 lines (I have 550,000 lines in the entire dataset). My work bench is much more complex than that above, but the only part that I can see it slowing down on is the UpdateDetector. Are there any suggestions for things I can do to speed it up? I have the parameters setup as follows;

capture.png

Can possibly share whole workbench if required. Thanks again.

Reply

Cookie policy

We use cookies to enhance and personalize your experience. If you accept you agree to our full cookie policy. Learn more about our cookies.

Cookie settings

We use 3 different kinds of cookies. You can choose which cookies you want to accept. We need basic cookies to make this site work, therefore these are the minimum you can select. Learn more about our cookies.

Basic
Functional

Normal
Functional + analytics

Complete
Functional + analytics + social media + embedded videos + marketing

Retain original attribute value over revised null value