Skip to main content

I have a very simple workspace that just tests whether a
string has a “.” Or a “:’ in it and if it does then these characters are deleted.
I am using the Testfilter to test for the characters, and then a stringreplacer
to delete them and then I must make sure the string is all caps. Is there a way
to do these three operations by using the String Functions in the Testfilter? I
haven’t been able to figure out how the string functions work. Also, will using
the string functions speed up the translation? I have 47 million records to test
and right now it takes 4 hours to run. I am using FME2018. I have included the
workspace and source CSV file. Thanks.

Hi @bd,

You can indeed do this all using String Functions however I do not believe they are available in the Test Filter so I would use the Attribute Manager and set up Conditional values for the STRING attribute.

Set up two tests to look for whether the string contains "." or ":" and then set the net attribute value to

@UpperCase(@ReplaceString(@value(STRING),.,""))

@UpperCase = to make sure outcome is all caps followed by

@ReplaceString(<string>,<before>,<after>) and change before to . or : accordingly. For the after you must put "" otherwise FME doesn't recognize that this is asking the string to be replaced with nothing and will fail.

Lastly set anything else to also be changed to upper case.

Having quickly done both methods on your test data I did find using string functions faster - 4.1 vs 7.1 seconds using FME 2018.1 (approx 40%), so I do believe this would improve performance. Ps your fmw file was not uploaded so this was using a mock up based on how I believe you did it before.


You can also do this with a single statement in an AttributeManager using the UpperCase and ReplaceRegEx string functions


PythonCaller:

class FeatureProcessor(object):
    def __init__(self):
        pass
    def input(self,feature):
        feature.setAttribute('STRING',feature.getAttribute('STRING')\
.replace('.','').replace(':','').upper())
        self.pyoutput(feature)
    def close(self):
        pass
 

I think this is better, cause it will also replace both characters if they occure in the same string. 

No need to do any checks, just bruteforce every string.


PythonCaller:

class FeatureProcessor(object):
    def __init__(self):
        pass
    def input(self,feature):
        feature.setAttribute('STRING',feature.getAttribute('STRING')\
.replace('.','').replace(':','').upper())
        self.pyoutput(feature)
    def close(self):
        pass
 

I think this is better, cause it will also replace both characters if they occure in the same string. 

No need to do any checks, just bruteforce every string.

A similar approach would be StringReplacer and CaseChanger. Not sure whether it'd be faster than the Python one that @paalped suggested but it's worth a try with a small sample.

 


You can use

[regexp {\\.*\\:*}]!=0 = 1 to test in the tester. If you need to.

But as you intend to replace the characters anyway, why test at all?

Just use a string replacer.

Mode: replace regulas expression

Text to replace: \\.|\\:

Replacement: none

(You can use the attribute manager/creator to do the same. )


Hi @bd,

 

 

Hopefully one of the solutions provided has worked for you. We here at Safe would love to use a sample of your data to test whether changes we are currently working on are helping improve performance using functions. If possible please could you upload a larger dataset (~ 20 times the current test.csv) to ftp://ftp.safe.com. You should be able to enter as a guest and submit the file to the top level. This would be greatly appreciated as with a bigger chunk of data we will really be able to see if we are speeding things up.

 

 

Many thanks,

 

Holly
Hi @bd,

 

 

Hopefully one of the solutions provided has worked for you. We here at Safe would love to use a sample of your data to test whether changes we are currently working on are helping improve performance using functions. If possible please could you upload a larger dataset (~ 20 times the current test.csv) to ftp://ftp.safe.com. You should be able to enter as a guest and submit the file to the top level. This would be greatly appreciated as with a bigger chunk of data we will really be able to see if we are speeding things up.

 

 

Many thanks,

 

Holly
Can I sent you an email about uploading a larger csv file so we can discuss off forum.

 

 


Can I sent you an email about uploading a larger csv file so we can discuss off forum.

 

 

Of course @bd - please use the Report a Problem form and I will pick up the case from this inbox.

Hi @bd,

You can indeed do this all using String Functions however I do not believe they are available in the Test Filter so I would use the Attribute Manager and set up Conditional values for the STRING attribute.

Set up two tests to look for whether the string contains "." or ":" and then set the net attribute value to

@UpperCase(@ReplaceString(@value(STRING),.,""))

@UpperCase = to make sure outcome is all caps followed by

@ReplaceString(<string>,<before>,<after>) and change before to . or : accordingly. For the after you must put "" otherwise FME doesn't recognize that this is asking the string to be replaced with nothing and will fail.

Lastly set anything else to also be changed to upper case.

Having quickly done both methods on your test data I did find using string functions faster - 4.1 vs 7.1 seconds using FME 2018.1 (approx 40%), so I do believe this would improve performance. Ps your fmw file was not uploaded so this was using a mock up based on how I believe you did it before.

Thanks for help. Using the attrbutemanager will be about %33 faster. That will save a lot of time when I am processing millions of records.

 

 


PythonCaller:

class FeatureProcessor(object):
    def __init__(self):
        pass
    def input(self,feature):
        feature.setAttribute('STRING',feature.getAttribute('STRING')\
.replace('.','').replace(':','').upper())
        self.pyoutput(feature)
    def close(self):
        pass
 

I think this is better, cause it will also replace both characters if they occure in the same string. 

No need to do any checks, just bruteforce every string.

 

@redgeographics

 

yeah I tried that to but it was 2 seconds slower.

PythonCaller:

class FeatureProcessor(object):
    def __init__(self):
        pass
    def input(self,feature):
        feature.setAttribute('STRING',feature.getAttribute('STRING')\
.replace('.','').replace(':','').upper())
        self.pyoutput(feature)
    def close(self):
        pass
 

I think this is better, cause it will also replace both characters if they occure in the same string. 

No need to do any checks, just bruteforce every string.

a tiny bit faster to only use a function call since methods are slightly slower:

 

def processFeature(feature):
    feature.setAttribute('STRING',feature.getAttribute('STRING')\
        .replace('.','').replace(':','').upper())

 


a tiny bit faster to only use a function call since methods are slightly slower:

 

def processFeature(feature):
    feature.setAttribute('STRING',feature.getAttribute('STRING')\
        .replace('.','').replace(':','').upper())

 

Is a little bit faster with AttributeManager:

 

@UpperCase(@ReplaceString(@ReplaceString(@Value(STRING),.,""),:,""))

 

 

no conditionals.

 

 

 


Hi @bd,

You can indeed do this all using String Functions however I do not believe they are available in the Test Filter so I would use the Attribute Manager and set up Conditional values for the STRING attribute.

Set up two tests to look for whether the string contains "." or ":" and then set the net attribute value to

@UpperCase(@ReplaceString(@value(STRING),.,""))

@UpperCase = to make sure outcome is all caps followed by

@ReplaceString(<string>,<before>,<after>) and change before to . or : accordingly. For the after you must put "" otherwise FME doesn't recognize that this is asking the string to be replaced with nothing and will fail.

Lastly set anything else to also be changed to upper case.

Having quickly done both methods on your test data I did find using string functions faster - 4.1 vs 7.1 seconds using FME 2018.1 (approx 40%), so I do believe this would improve performance. Ps your fmw file was not uploaded so this was using a mock up based on how I believe you did it before.

Hi all and @hollyatsafe, I am trying to do similar, but replacing month with the number of the month (eg, JAN replaced with 01). I am using attribute manager (i noticed paalped also commented on this method in this thread):

See below:AMThis is the condition:

SETIt is detecting that it has JAN inside, but not replacing it with 01. Why? What am I doing wrong.

Here is the output:

MONTHThanks!


Hi all and @hollyatsafe, I am trying to do similar, but replacing month with the number of the month (eg, JAN replaced with 01). I am using attribute manager (i noticed paalped also commented on this method in this thread):

See below:AMThis is the condition:

SETIt is detecting that it has JAN inside, but not replacing it with 01. Why? What am I doing wrong.

Here is the output:

MONTHThanks!

Hi, remove the quotes in the formula in AttributeValue. Also there appears to be no need for @UpperCase.

@ReplaceString(@Value(CALENDAR_FORMATTED_DATE),JAN,01)

If you want to do this for all months, take a look at the StringPairReplacer.


Reply