Solved

Identify matching substrings from multiple attributes

7 months ago
November 18, 2024
5 replies
105 views

pkno
Contributor
21 replies

Hi everyone,

I am trying to match attributes and write the matching part of the substrings into a new attribute. The following table explains it best:

1	2	3	4	match
03.02.06.03.02.01	03.02.06.03.04	03.02.06.03.02.02	03.02.06.04.03.01	03.02.06
03.02.06.03.04.02	03.02.06.03.09.01	03.02.06.03.01.01	03.02.06.03.04.01	03.02.06.03
03.01.04.05	03.01.04			03.01.04

Note that the fields may also be empty and should then be ignored and not all strings are the same lenght. I am also trying to match from the start of the string only.

The only idea that came to me so far is to create lists for each attribute using the attribute splitter and then use conditionals to compare each two-digit part seperately, create each sequence and then string them back together. Maybe there is an easier way?

Best reagrds.

Best answer by bwn

AttributeSplitter will work but have to think more laterally about the logic 😉 Channelling our inner @takashi who had excellent tips of transforming non-spatial coordinates into spatial coordinates to find complex intersections using spatial intersection transformers, we can conceptualise the data is actually Linestrings.

Eg. “03.02.06.03.02.01” is actually a line with 6 vertices where:

X = Substring Position Number

Y = Substring Numerical Value

Becomes:

LINESTRING(0 3, 1 2, 2 6, 3 3, 4 2, 5 1)

What we can do then, is transform the substring number sequences into Line Geometries with vertex coordinates corresponding to each Substring “Point”, and then use LineOnLineOverlayer to tell you which “partial lines” overlap, which will be just the part of the total string that contains overlapping substrings, per Row Group.

Sample data, with Row Group Number added with a Counter:

Flowing through to AttributeSplitter, this now gives the Y coordinates of the Line vertices, being the Substring numeric values

Build Line Geometries out of this data that are going to find the coincident line parts with LineOnLineOverlayer

Now post-process and using Tester just select the sub-line that absolutely contains a Substring in the 0 X coordinate position (ie. This is a line that does contain the very first substring value) within the Total String AND make sure the Overlap Count that contains the same Count of Valid Columns. ie. We need to ensure this line overlaps 4 times for 4 valid columns. If it only overlapped 3 times then this was one column that did not have the same starting substring value and there would be no common substring then for that row.

The output of the Tester below will be the longest sequence of “points” that being from the starting substring, and matches across all of the columns

Can see for Row 1 in the original table, the longest matching sequence output in the LineOnLineOverlayer List is “03, 02, 06” . I haven’t shown the extra ListExploder, AttributeCreator to put the leading “0” back onto the number and Aggregator to put them back into concantentated, comma-delimited form.

View original

Did this help you find an answer to your question?

+39

ebygomm
Influencer
3288 replies
7 months ago
November 18, 2024

I’m not sure there’s an elegant way to do it in FME. A python method is probably quite straightforward and probably is less lines of code than FME transformers

+26

bwn
Evangelist
562 replies
Best Answer
7 months ago
November 19, 2024

Eg. “03.02.06.03.02.01” is actually a line with 6 vertices where:

X = Substring Position Number

Y = Substring Numerical Value

Becomes:

LINESTRING(0 3, 1 2, 2 6, 3 3, 4 2, 5 1)

Sample data, with Row Group Number added with a Counter:

Flowing through to AttributeSplitter, this now gives the Y coordinates of the Line vertices, being the Substring numeric values

Build Line Geometries out of this data that are going to find the coincident line parts with LineOnLineOverlayer

+50

geomancer
Evangelist
881 replies
7 months ago
November 19, 2024

Interesting approach!

I was thinking this should be possible using looping, but your solution is so much more elegant!

+26

bwn
Evangelist
562 replies
6 months ago
November 21, 2024

Attached completed workspace as sample.

1 Attachments

SubstringMatcher.zip

pkno
Author
Contributor
21 replies
6 months ago
November 21, 2024

Thats a cool approach, tanks so much!

Tbh I had already implemented my attribute splitter → attribute manager with a bunch of conditions idea before you answered which its good enough for now as things are piling up at work. If I have some time I will revisit this later. For now I will mark this as the best answer.

Reply

Rich Text Editor, editor1

Cookie policy

We use cookies to enhance and personalize your experience. If you accept you agree to our full cookie policy. Learn more about our cookies.

Cookie settings

We use 3 different kinds of cookies. You can choose which cookies you want to accept. We need basic cookies to make this site work, therefore these are the minimum you can select. Learn more about our cookies.

Basic
Functional

Normal
Functional + analytics

Complete
Functional + analytics + social media + embedded videos + marketing

Identify matching substrings from multiple attributes