Hi,I have a series of regular expression and new string pairs.If the regular expression gets a match I don't just want to replace the characters that match I want to replace the whole matched string with the new string.As an example;If the old string is - CAROLA 1.6ABCThe regex is - C[AO]?[CR]R?O?O?L?L?L?LI?A?The new string needs to be - COROLLATo make matters more difficult, there are about 400+ unique pairings. Thanks in advance

Replace regex match with whole new string

D

Put a .* at the end? And a .* at the start if you want to get rid of prefixes as well. Or am I underthinking this?

.*C[AO]?[CR]R?O?O?L?L?L?LI?A?.*

D

Put a .* at the end? And a .* at the start if you want to get rid of prefixes as well. Or am I underthinking this?

.*C[AO]?[CR]R?O?O?L?L?L?LI?A?.*

But this actually sounds like a job for SchemaMapper instead.

G

+3

@cj

AS you appearantly are trying to find all or as many possible forms of somehting that sounds or looks like "COROLLA" you cannot capture it to be replaced, as yu would be capturing any incoorrect written version.

Yyou would be filtering and mapping.

You could use attributecreator

[regexp -all {C[AO]?[CR]R?O?O?L?L?L?LI?A?} {@Value(tt)}]!=0?"COROLLA":"NOPE"

(a conditional statement)

Or the provided Conditional statementstructure.

C

cj
Author
19 replies
4 years ago
9 September 2019

But this actually sounds like a job for SchemaMapper instead.

The * at each end would help with respect to getting rid of suffixes and prefixes. The real challenge however is how to apply the 400+ combinations.

C

cj
Author
19 replies
4 years ago
9 September 2019

@cj

AS you appearantly are trying to find all or as many possible forms of somehting that sounds or looks like "COROLLA" you cannot capture it to be replaced, as yu would be capturing any incoorrect written version.

Yyou would be filtering and mapping.

You could use attributecreator

[regexp -all {C[AO]?[CR]R?O?O?L?L?L?LI?A?} {@Value(tt)}]!=0?"COROLLA":"NOPE"

(a conditional statement)

Or the provided Conditional statementstructure.

Thanks, a conditional statement is the logic I am trying to achieve (if MATCH then replace string). The additionally difficult part however is how to scale that to the 400+ unique combinations.

C

cj
Author
19 replies
4 years ago
9 September 2019

I have come up with the follow python script;

import re

str = "CAROLA 1.6ABC"
match = re.match("C[AO]?[CR]R?O?O?L?L?L?LI?A?",str)
if match is not None:
    str = str.replace(str, "COROLLA")
    print (str)
else:
    print(match)

This does the search and replace well. But now trying to figure out a way to scale this. Ideally it would run off some sort of lookup table that would contain all the unique combos of the regex and new string.

C

cj
Author
19 replies
4 years ago
10 September 2019

Have managed to achieve the desired outcome with the following workflow;

Basically merge the REGEX and NEWMODEL values as a list onto the incoming data then use the StringSearcher to test all the old models against the regular expressions for that MAKE, then replace the old model with NEWMODEL for those that match.

Feels a bit inefficient, I still think there is a way using the above python script, the look up table, and looping in a custom transformer maybe.

D

I have come up with the follow python script;

import re

str = "CAROLA 1.6ABC"
match = re.match("C[AO]?[CR]R?O?O?L?L?L?LI?A?",str)
if match is not None:
    str = str.replace(str, "COROLLA")
    print (str)
else:
    print(match)

This does the search and replace well. But now trying to figure out a way to scale this. Ideally it would run off some sort of lookup table that would contain all the unique combos of the regex and new string.

An important thing here is whether you are doing a one-time translation of a fixed dataset or if you need to build a robust system to tackle incoming data of varying quality.

If it's the first, a fixed dataset, I would just set up the mapping in a spreadsheet and use SchemaMapper. But if you get new data all the time, you need to predict the errors in the data and/or build a system to catch the new variants that you need to incorporate in the mapping, making the system more and more robust as you go along.

C

cj
Author
19 replies
4 years ago
10 September 2019

An important thing here is whether you are doing a one-time translation of a fixed dataset or if you need to build a robust system to tackle incoming data of varying quality.

If it's the first, a fixed dataset, I would just set up the mapping in a spreadsheet and use SchemaMapper. But if you get new data all the time, you need to predict the errors in the data and/or build a system to catch the new variants that you need to incorporate in the mapping, making the system more and more robust as you go along.

This is not a one-time translation, this will be processing new and updated data on a regular schedule.

Userlevel 1

+10

ebygomm
Participant
3078 replies
4 years ago
10 September 2019

Have managed to achieve the desired outcome with the following workflow;

Basically merge the REGEX and NEWMODEL values as a list onto the incoming data then use the StringSearcher to test all the old models against the regular expressions for that MAKE, then replace the old model with NEWMODEL for those that match.

Feels a bit inefficient, I still think there is a way using the above python script, the look up table, and looping in a custom transformer maybe.

I'd look at keeping the first part of your workflow as far as the featuremerger, then use a pythoncaller to loop through the list

e.g.

import fme
import fmeobjects
import re

def processFeature(feature):
    model= feature.getAttribute('model')
    ziplist = zip(feature.getAttribute('_list{}.newstring'),feature.getAttribute('_list{}.regex'))
    for i in ziplist:
        match = re.match(i[1],model)
        if match is not None:
            feature.setAttribute("newstring",i[0])

+3

Actually, I don't think RegEx is the best solution here, as you would struggle to catch all varieties. Plus you would need to set up an entirely new RegEx when the next model is on your list.

Have you looked into the FuzzyStringComparer (or its big sister FuzzyStringCompareFrom2Datasets)? Effectively it compares two strings and gives a difference ratio / similarity score between 1 and 0: 1 = strings are identical, 0 = strings are entirely different.

See attached workbench

Replace regex match with whole new string

11 replies

Reply

Community Stats

Reply

Community Stats

Sign up

An FME Account is required to contribute

Login to the community

An FME Account is required to contribute

Scanning file for viruses.

This file cannot be downloaded