Solved

How to merge two datasets based on matching attributes

7 years ago
August 10, 2017
8 replies
281 views

bey_atkins2009
5 replies

Pl. see attached zipped excel file to find the explanation of the problem statement

Best answer by takashi

Hi @bey_atkins2009, if I understand the requirement correctly, this workflow might help you.

Did this help you find an answer to your question?

Forum|alt.badge.img

+50

redgeographics
Celebrity
3638 replies
7 years ago
August 10, 2017

A combination of FeatureMerger (or DatabaseJoiner) and ListExploder would be a good place to start.

Forum|alt.badge.img

+19

fmelizard
Safer
3725 replies
7 years ago
August 10, 2017

Hi @bey_atkins2009, Combining Multiple Streams of Data goes through a couple examples of merging data. Like @redgeographics mentioned, you'll want to look at the FeatureMerger.

Forum|alt.badge.img

+3

xiaomengatsafe
Safer
327 replies
7 years ago
August 11, 2017

Hi,

@bey_atkins2009

I had a closer look at the example you provided. And I think both @redgeographics and @TiaAtSafe are definitely correct in suggesting FeatureMerger, also ListExploder. A bit more detail on what I'm thinking:

1. Merge the FROM NODE and END NODE from Dataset 2 with the ID in dataset 1 in parallel, and make sure to process duplicate suppliers, and Generate a list from them. (because, each From and End node will be matched with more than one ID in Dataset 1)

2. for each merged result, use the ListExploder to break down the list into individual features, and use a NullAttributeMapper to map missing attribute (from the UnmergedRequestor and UnusedSupplier port) to 0, and 0:00, depending on what attribute it is.

3. Note, when you explode the list, consider adding a prefix to each stream, to differentiate the From and To matches. this will help when you merge in the next step.

4. Use another FeatureMerger to merge the two streams together, basedon pipeID, and the time attributes from both streams.

5. After the last FeatureMerger, you might want to look at mapping missing attributes, like we did in step 2, as well.

6. in the end, you can clean up the unwanted attributes, using AttributeManager, AttributetKeeper, or AttributeRemover.

Here is a screenshot of the workflow. Not sure if it the most efficient approach. Hope it gives you some ideas, anyway.

bey_atkins2009
Author
5 replies
7 years ago
August 14, 2017

xiaomengatsafe wrote:

Hi,

@bey_atkins2009

I had a closer look at the example you provided. And I think both @redgeographics and @TiaAtSafe are definitely correct in suggesting FeatureMerger, also ListExploder. A bit more detail on what I'm thinking:

1. Merge the FROM NODE and END NODE from Dataset 2 with the ID in dataset 1 in parallel, and make sure to process duplicate suppliers, and Generate a list from them. (because, each From and End node will be matched with more than one ID in Dataset 1)

2. for each merged result, use the ListExploder to break down the list into individual features, and use a NullAttributeMapper to map missing attribute (from the UnmergedRequestor and UnusedSupplier port) to 0, and 0:00, depending on what attribute it is.

3. Note, when you explode the list, consider adding a prefix to each stream, to differentiate the From and To matches. this will help when you merge in the next step.

4. Use another FeatureMerger to merge the two streams together, basedon pipeID, and the time attributes from both streams.

5. After the last FeatureMerger, you might want to look at mapping missing attributes, like we did in step 2, as well.

6. in the end, you can clean up the unwanted attributes, using AttributeManager, AttributetKeeper, or AttributeRemover.

Here is a screenshot of the workflow. Not sure if it the most efficient approach. Hope it gives you some ideas, anyway.

Many thanks for your response. I am new to FME. I could get what comes out of the 1st sets of FeatureMerger and the ListExploders after that. Could you kindly elaborate on what all attributes you are selecting in "NullAttributeMapper" and how it works.

For the 1450 unmatched requestors, I thought that NullAttributeMapper will expand the list and put in zeroes for pressure and elevation.

Could you kindly clarify.

Thanks

Yogish

takashi
7683 replies
Best Answer
7 years ago
August 14, 2017

Hi @bey_atkins2009, if I understand the requirement correctly, this workflow might help you.

takashi
7683 replies
7 years ago
August 14, 2017

takashi wrote:

Hi @bey_atkins2009, if I understand the requirement correctly, this workflow might help you.

Update. If it's guaranteed that there isn't any exception in the format for TIME values (%H:%M hrs) at all, it's not essential to calculate the minutes on the Dataset 1. The workflow could also be a little simpler,

bey_atkins2009
Author
5 replies
7 years ago
August 17, 2017

takashi wrote:

Update. If it's guaranteed that there isn't any exception in the format for TIME values (%H:%M hrs) at all, it's not essential to calculate the minutes on the Dataset 1. The workflow could also be a little simpler,

Hi @takashi,

Thanks for your solutions. This last solution you provided works for me. Many thanks again.

As a further step, I would like to segregate the data of 'From_Pres_Elev' and 'To_Pres_Elev' into several ranges like <5, 5-10, 10-20, 20-30, 30-40, 40-50 and finally >50 and obtain a sum and a count for each of the ranges. I am thinking of using "TestFilter" and "StatisticsCalculator" transformers and hoping they would do the job.

Do you have any suggestions or corrections on this?

Looking forward to your response.

Regards

Yogish

Forum|alt.badge.img

+50

redgeographics
Celebrity
3638 replies
7 years ago
August 17, 2017

bey_atkins2009 wrote:

Hi @takashi,

Thanks for your solutions. This last solution you provided works for me. Many thanks again.

As a further step, I would like to segregate the data of 'From_Pres_Elev' and 'To_Pres_Elev' into several ranges like <5, 5-10, 10-20, 20-30, 30-40, 40-50 and finally >50 and obtain a sum and a count for each of the ranges. I am thinking of using "TestFilter" and "StatisticsCalculator" transformers and hoping they would do the job.

Do you have any suggestions or corrections on this?

Looking forward to your response.

Regards

Yogish

Instead of the TestFilter you could use an AttributeRangeFilter but other than that, yes, that should be the way to go.

Reply

Rich Text Editor, editor1

Cookie policy

We use cookies to enhance and personalize your experience. If you accept you agree to our full cookie policy. Learn more about our cookies.

×

Cookie settings

We use 3 different kinds of cookies. You can choose which cookies you want to accept. We need basic cookies to make this site work, therefore these are the minimum you can select. Learn more about our cookies.

Basic
Functional

Normal
Functional + analytics

Complete
Functional + analytics + social media + embedded videos + marketing