Skip to main content
Solved

Join based on two fields + maximum difference between distance value in join field

  • February 14, 2020
  • 6 replies
  • 30 views

lambertus
Enthusiast
Forum|alt.badge.img+15

I have two datasets which need to joined. Both datasets store the attributes 'CI_nummer' (numeric) and 'NaamObject'. An example value of the last looks like: ZN VKS MSI A16L 46,935 1

I want to join the datasets if both these attributes match. . Beside this, the NaamObject also needs to match for a certain minimum of characters.

ZN VKS MSI A16L 46,935 1

46,935 these numbers represent a distance value. 46 kilometres and 935 meters.

When the difference between two distance values (dataset A versus B) is less than 500 meters I want FME to see these values as identical. For example:

These two values should be considered as identical:

ZN VKS MSI A16L 46,935 1

ZN VKS MSI A16L 46,835 3

These two values should be considered as identical

N VKS MSI A16L 46,935 1

ZN VKS MSI A16L 47,122 3

These two values should not be considered as identical

N VKS MSI A16L 46,935 1

ZN VKS MSI A16L 48,122 3

I am thinking about something with a regular expression? Any ideas how to solve this issue?

 

 

 

Best answer by ebygomm

Assuming the format is consistent, you could split by space and then create new attributes where the number is separated

Depending on your input you could then join on the first attribute and then test for the difference in numbers between the two joined attributes and discard any where the difference is greater than 500m.

Otherwise you'll probably need to use an inlinejoiner to do the range match

View original
Did this help you find an answer to your question?

6 replies

ebygomm
Influencer
Forum|alt.badge.img+38
  • Influencer
  • Best Answer
  • February 14, 2020

Assuming the format is consistent, you could split by space and then create new attributes where the number is separated

Depending on your input you could then join on the first attribute and then test for the difference in numbers between the two joined attributes and discard any where the difference is greater than 500m.

Otherwise you'll probably need to use an inlinejoiner to do the range match


lambertus
Enthusiast
Forum|alt.badge.img+15
  • Author
  • Enthusiast
  • February 14, 2020
ebygomm wrote:

Assuming the format is consistent, you could split by space and then create new attributes where the number is separated

Depending on your input you could then join on the first attribute and then test for the difference in numbers between the two joined attributes and discard any where the difference is greater than 500m.

Otherwise you'll probably need to use an inlinejoiner to do the range match

Thanks for your reply! It sounds promising. I am going to try your suggestion.


lambertus
Enthusiast
Forum|alt.badge.img+15
  • Author
  • Enthusiast
  • February 14, 2020
ebygomm wrote:

Assuming the format is consistent, you could split by space and then create new attributes where the number is separated

Depending on your input you could then join on the first attribute and then test for the difference in numbers between the two joined attributes and discard any where the difference is greater than 500m.

Otherwise you'll probably need to use an inlinejoiner to do the range match

It works perfect! :) However, why not creating just 'attr2' and populate it with _list{4} ?


ebygomm
Influencer
Forum|alt.badge.img+38
  • Influencer
  • February 14, 2020
lambertus wrote:

It works perfect! :) However, why not creating just 'attr2' and populate it with _list{4} ?

I swapped the comma for a decimal point which would be necessary for doing a numeric comparison with my locale settings. I can't recall whether FME will accept a comma as the decimal separator dependent on the system settings.


lambertus
Enthusiast
Forum|alt.badge.img+15
  • Author
  • Enthusiast
  • February 15, 2020
ebygomm wrote:

I swapped the comma for a decimal point which would be necessary for doing a numeric comparison with my locale settings. I can't recall whether FME will accept a comma as the decimal separator dependent on the system settings.

I get it. Is there also a way to check the consistency of my data?

This would mean that every value of het attribute NaamObject should have the same number of spaces? Maybe you could check for the length of the list?


ebygomm
Influencer
Forum|alt.badge.img+38
  • Influencer
  • February 15, 2020
lambertus wrote:
I get it. Is there also a way to check the consistency of my data?

This would mean that every value of het attribute NaamObject should have the same number of spaces? Maybe you could check for the length of the list?

Yes, checking the length of the list would confirm the number of spaces in the attribute. You could also test for contains regex with a more specific match, e.g. 2 letters space 3 letters space 3 letters space etc.


Reply


Cookie policy

We use cookies to enhance and personalize your experience. If you accept you agree to our full cookie policy. Learn more about our cookies.

 
Cookie settings