Question

Filter rows by partial match to a variable set of values

  • 8 April 2022
  • 3 replies
  • 54 views

Hi all, I am trying to filter and extract rows from dataset A based on partial and exact match to values of dataset B. Dataset B will be different each time I run the workbench.

B is actually a subset of A, generated from a separate query as part of another process. So There will always be an exact match, but there may or may not be partial matches. Where partial matches are found it will be in the form of a suffix to the ID (eg ‘v1’ at the end).

My reading of this forum suggests there are two issues that make this filter more complicated:

1. dataset B is an output of another query so the values will always vary

2. I need to extract instances of partial matching as well as exact matching.

example:

dataset A (n=4000)

ID

MRR_1234v1

MRR_1234v2

MRR_1234

TR_5678

TR_5678v2

FG_9876

 

dataset B (same table structure at B, n<20)

ID

MRR_1234

TR_5678

 

i want an excel output of dataset A, that contains only the rows where the ID contains that of the ID field values for B. Attribute only, no geometry. In the example above that would filter out the last value in A.

my reading suggests that Feature merger only works with exact matches, and there are lots of transformers that work well with known values added into the expression but not a ‘dynamic’ input or @value(attribute). I played with testerfilter and expressions but never got it to work. I feel like I might be able to use stringsearcher somehow but can’t get to grips with how I would apply it. I think InlineQuerier is also an option but i am only a beginner so need some guidance.

my question is firstly, are my above assumptions correct or am I missing something obvious? If correct, could someone give me a few pointers to start me off in the right direction with either stringsearcher or inlinequerier? Many thanks

 


3 replies

Userlevel 2
Badge +17

Hi @krio​,

The FeatureMerger can match on expressions as well as attribute values.

If table A is the Requestor, in the Join On section of the FeatureMerger, click the triangle beside the Requestor box and choose Text Editor. You can then use the following expression to test the ID values of A with the version number removed:

@ReplaceRegularExpression(@Value(ID),"v[0-9]$","",caseSensitive=TRUE)

This expression will remove the "v1", "v2", etc. at the end of the ID values before testing, but doesn't change the value of attribute.

@daveatsafe​ thank you, that looks promising. I’ve had a look through dataset A - looks like I need to factor in inconsistencies in application of the naming convention. So instead of v1, there could be V01 or possibly even .1

unfortunately, I’m not able to ‘clean’ dataset A so I’ll have to work with it, as is.

I assume the expression can be changed to be case insensitive but how to i account for the other variations? Can I pipe to include other variants? Or is there a way to look at it as a ‘begins with’ search to find the ID?

many thanks!

Userlevel 2
Badge +17

@daveatsafe​ thank you, that looks promising. I’ve had a look through dataset A - looks like I need to factor in inconsistencies in application of the naming convention. So instead of v1, there could be V01 or possibly even .1

unfortunately, I’m not able to ‘clean’ dataset A so I’ll have to work with it, as is. 

I assume the expression can be changed to be case insensitive but how to i account for the other variations? Can I pipe to include other variants? Or is there a way to look at it as a ‘begins with’ search to find the ID? 

many thanks! 

Hi @krio​,

That's the beauty of using regular expression - they can be configured to handle many different strings.

@ReplaceRegularExpression(@Value(ID),"v[0-9,\.]+$","",caseSensitive=FALSE)

The string above should be able to handle cases like:

MRR_1234v02

MRR_1234v.2

MRR_1234v2.4

A good site to interactively test your regular expressions is https://regexr.com/

Reply