Skip to main content
Solved

How to merge two datasets based on matching attributes


Pl. see attached zipped excel file to find the explanation of the problem statement

Best answer by takashi

Hi @bey_atkins2009, if I understand the requirement correctly, this workflow might help you.

View original
Did this help you find an answer to your question?

8 replies

redgeographics
Celebrity
Forum|alt.badge.img+49

A combination of FeatureMerger (or DatabaseJoiner) and ListExploder would be a good place to start.


fmelizard
Contributor
Forum|alt.badge.img+17
  • Contributor
  • August 10, 2017

Hi @bey_atkins2009, Combining Multiple Streams of Data goes through a couple examples of merging data. Like @redgeographics mentioned, you'll want to look at the FeatureMerger.


xiaomengatsafe
Safer
Forum|alt.badge.img+3

Hi,

@bey_atkins2009

I had a closer look at the example you provided. And I think both @redgeographics and @TiaAtSafe are definitely correct in suggesting FeatureMerger, also ListExploder. A bit more detail on what I'm thinking:

 

1. Merge the FROM NODE and END NODE from Dataset 2 with the ID in dataset 1 in parallel, and make sure to process duplicate suppliers, and Generate a list from them. (because, each From and End node will be matched with more than one ID in Dataset 1)

 

2. for each merged result, use the ListExploder to break down the list into individual features, and use a NullAttributeMapper to map missing attribute (from the UnmergedRequestor and UnusedSupplier port) to 0, and 0:00, depending on what attribute it is.

3. Note, when you explode the list, consider adding a prefix to each stream, to differentiate the From and To matches. this will help when you merge in the next step.

4. Use another FeatureMerger to merge the two streams together, basedon pipeID, and the time attributes from both streams.

 

5. After the last FeatureMerger, you might want to look at mapping missing attributes, like we did in step 2, as well.

6. in the end, you can clean up the unwanted attributes, using AttributeManager, AttributetKeeper, or AttributeRemover.

 

Here is a screenshot of the workflow. Not sure if it the most efficient approach. Hope it gives you some ideas, anyway.


xiaomengatsafe wrote:

Hi,

@bey_atkins2009

I had a closer look at the example you provided. And I think both @redgeographics and @TiaAtSafe are definitely correct in suggesting FeatureMerger, also ListExploder. A bit more detail on what I'm thinking:

 

1. Merge the FROM NODE and END NODE from Dataset 2 with the ID in dataset 1 in parallel, and make sure to process duplicate suppliers, and Generate a list from them. (because, each From and End node will be matched with more than one ID in Dataset 1)

 

2. for each merged result, use the ListExploder to break down the list into individual features, and use a NullAttributeMapper to map missing attribute (from the UnmergedRequestor and UnusedSupplier port) to 0, and 0:00, depending on what attribute it is.

3. Note, when you explode the list, consider adding a prefix to each stream, to differentiate the From and To matches. this will help when you merge in the next step.

4. Use another FeatureMerger to merge the two streams together, basedon pipeID, and the time attributes from both streams.

 

5. After the last FeatureMerger, you might want to look at mapping missing attributes, like we did in step 2, as well.

6. in the end, you can clean up the unwanted attributes, using AttributeManager, AttributetKeeper, or AttributeRemover.

 

Here is a screenshot of the workflow. Not sure if it the most efficient approach. Hope it gives you some ideas, anyway.

Many thanks for your response. I am new to FME. I could get what comes out of the 1st sets of FeatureMerger and the ListExploders after that. Could you kindly elaborate on what all attributes you are selecting in "NullAttributeMapper" and how it works.

 

 

For the 1450 unmatched requestors, I thought that NullAttributeMapper will expand the list and put in zeroes for pressure and elevation.

 

 

Could you kindly clarify.

 

 

Thanks

 

Yogish

 

 

 


takashi
Influencer
  • Best Answer
  • August 14, 2017

Hi @bey_atkins2009, if I understand the requirement correctly, this workflow might help you.


takashi
Influencer
  • August 14, 2017
takashi wrote:

Hi @bey_atkins2009, if I understand the requirement correctly, this workflow might help you.

Update. If it's guaranteed that there isn't any exception in the format for TIME values (%H:%M hrs) at all, it's not essential to calculate the minutes on the Dataset 1. The workflow could also be a little simpler,

 

 


takashi wrote:
Update. If it's guaranteed that there isn't any exception in the format for TIME values (%H:%M hrs) at all, it's not essential to calculate the minutes on the Dataset 1. The workflow could also be a little simpler,

 

 

Hi @takashi,

 

Thanks for your solutions. This last solution you provided works for me. Many thanks again.

 

 

As a further step, I would like to segregate the data of 'From_Pres_Elev' and 'To_Pres_Elev' into several ranges like <5, 5-10, 10-20, 20-30, 30-40, 40-50 and finally >50 and obtain a sum and a count for each of the ranges. I am thinking of using "TestFilter" and "StatisticsCalculator" transformers and hoping they would do the job.

 

 

Do you have any suggestions or corrections on this?

 

 

Looking forward to your response.

 

 

Regards

 

Yogish

 


redgeographics
Celebrity
Forum|alt.badge.img+49
bey_atkins2009 wrote:
Hi @takashi,

 

Thanks for your solutions. This last solution you provided works for me. Many thanks again.

 

 

As a further step, I would like to segregate the data of 'From_Pres_Elev' and 'To_Pres_Elev' into several ranges like <5, 5-10, 10-20, 20-30, 30-40, 40-50 and finally >50 and obtain a sum and a count for each of the ranges. I am thinking of using "TestFilter" and "StatisticsCalculator" transformers and hoping they would do the job.

 

 

Do you have any suggestions or corrections on this?

 

 

Looking forward to your response.

 

 

Regards

 

Yogish

 

Instead of the TestFilter you could use an AttributeRangeFilter but other than that, yes, that should be the way to go.

 

 


Reply


Cookie policy

We use cookies to enhance and personalize your experience. If you accept you agree to our full cookie policy. Learn more about our cookies.

 
Cookie settings