Skip to main content
Solved

Join based on two fields + maximum difference between distance value in join field

  • February 14, 2020
  • 6 replies
  • 41 views

lambertus
Enthusiast
Forum|alt.badge.img+23

I have two datasets which need to joined. Both datasets store the attributes 'CI_nummer' (numeric) and 'NaamObject'. An example value of the last looks like: ZN VKS MSI A16L 46,935 1

I want to join the datasets if both these attributes match. . Beside this, the NaamObject also needs to match for a certain minimum of characters.

ZN VKS MSI A16L 46,935 1

46,935 these numbers represent a distance value. 46 kilometres and 935 meters.

When the difference between two distance values (dataset A versus B) is less than 500 meters I want FME to see these values as identical. For example:

These two values should be considered as identical:

ZN VKS MSI A16L 46,935 1

ZN VKS MSI A16L 46,835 3

These two values should be considered as identical

N VKS MSI A16L 46,935 1

ZN VKS MSI A16L 47,122 3

These two values should not be considered as identical

N VKS MSI A16L 46,935 1

ZN VKS MSI A16L 48,122 3

I am thinking about something with a regular expression? Any ideas how to solve this issue?

 

 

 

Best answer by ebygomm

Assuming the format is consistent, you could split by space and then create new attributes where the number is separated

Depending on your input you could then join on the first attribute and then test for the difference in numbers between the two joined attributes and discard any where the difference is greater than 500m.

Otherwise you'll probably need to use an inlinejoiner to do the range match

This post is closed to further activity.
It may be an old question, an answered question, an implemented idea, or a notification-only post.
Please check post dates before relying on any information in a question or answer.
For follow-up or related questions, please post a new question or idea.
If there is a genuine update to be made, please contact us and request that the post is reopened.

6 replies

ebygomm
Influencer
Forum|alt.badge.img+44
  • Influencer
  • 3427 replies
  • Best Answer
  • February 14, 2020

Assuming the format is consistent, you could split by space and then create new attributes where the number is separated

Depending on your input you could then join on the first attribute and then test for the difference in numbers between the two joined attributes and discard any where the difference is greater than 500m.

Otherwise you'll probably need to use an inlinejoiner to do the range match


lambertus
Enthusiast
Forum|alt.badge.img+23
  • Author
  • Enthusiast
  • 141 replies
  • February 14, 2020

Assuming the format is consistent, you could split by space and then create new attributes where the number is separated

Depending on your input you could then join on the first attribute and then test for the difference in numbers between the two joined attributes and discard any where the difference is greater than 500m.

Otherwise you'll probably need to use an inlinejoiner to do the range match

Thanks for your reply! It sounds promising. I am going to try your suggestion.


lambertus
Enthusiast
Forum|alt.badge.img+23
  • Author
  • Enthusiast
  • 141 replies
  • February 14, 2020

Assuming the format is consistent, you could split by space and then create new attributes where the number is separated

Depending on your input you could then join on the first attribute and then test for the difference in numbers between the two joined attributes and discard any where the difference is greater than 500m.

Otherwise you'll probably need to use an inlinejoiner to do the range match

It works perfect! :) However, why not creating just 'attr2' and populate it with _list{4} ?


ebygomm
Influencer
Forum|alt.badge.img+44
  • Influencer
  • 3427 replies
  • February 14, 2020

It works perfect! :) However, why not creating just 'attr2' and populate it with _list{4} ?

I swapped the comma for a decimal point which would be necessary for doing a numeric comparison with my locale settings. I can't recall whether FME will accept a comma as the decimal separator dependent on the system settings.


lambertus
Enthusiast
Forum|alt.badge.img+23
  • Author
  • Enthusiast
  • 141 replies
  • February 15, 2020

I swapped the comma for a decimal point which would be necessary for doing a numeric comparison with my locale settings. I can't recall whether FME will accept a comma as the decimal separator dependent on the system settings.

I get it. Is there also a way to check the consistency of my data?

This would mean that every value of het attribute NaamObject should have the same number of spaces? Maybe you could check for the length of the list?


ebygomm
Influencer
Forum|alt.badge.img+44
  • Influencer
  • 3427 replies
  • February 15, 2020
I get it. Is there also a way to check the consistency of my data?

This would mean that every value of het attribute NaamObject should have the same number of spaces? Maybe you could check for the length of the list?

Yes, checking the length of the list would confirm the number of spaces in the attribute. You could also test for contains regex with a more specific match, e.g. 2 letters space 3 letters space 3 letters space etc.