Solved

Using the PythonCaller to extract substring between two known substrings


Badge

 

 

Hello, fellow FMEers.

 

 

I'm very new to Python and to the PythonCaller, but we all need to start somewhere. I have wrestled with this problem basically all day and I still don't get it to work.

 

 

I have a textstring "A=0,5m, B = Bredd 0 m, C= Reduktionstal". From this string I want to extract the value between "Bredd " and " m" - in this case it is 0, but in can be 10, it can be 10.2.

 

 

I'm attaching an image of my progress so far... And the PythonCaller script so far:

 

import fme

 

import fmeobjects

 

import re

 

 

def FeatureProcessor(feature):

 

substr = re.search('Bredd (.*) m', feature.getAttribute('text'))

 

feature.setAttribute("substr", substr)

 

 

I have often come across this problem and I really want to learn how to solve it. Pls help!

 

 

Peter
icon

Best answer by takashi 13 November 2015, 01:50

View original

11 replies

Badge
Hi Peter,

 

 

try this:

 

 

nameOfYourStr="A=0,5m, B = Bredd 0 m, C= Reduktionstal"

 

start=nameOfYourStr.find('Bredd ')+6

 

end=nameOfYourStr.find(' m, C=')

 

substr=nameOfYourStr[start:end]

 

 

Yann

 

Userlevel 4
Hi

 


 


You do not have to use Python for this, a simple StringSearcher will suffice. Use the expression

 


 

Bredd\s*([\d.]+)\s*m

And you will get your value as _matched_parts{0}

 

 

David
Badge
Hi Peter,

 

 

In one line:

 

feature.setAttribute('substr', feature.getAttribute('text').split('Bredd')[1].split('m')
[0].strip())
And step by step:

 

#Get the text attribute
att = feature.getAttribute('text')
#Right part of Bredd
right = att.split('Bredd')[1]
#Left part of m
left = right.split('m')[0]
#Trim extra spaces
substr = left.strip()
feature.setAttribute('substr', substr)

 

Larry
Userlevel 2
Badge +17
Hi Peter,

 


 


I would also use the StringSearcher in this case. However, if you want to learn Python regex operations, it's also a good practice of course. There could be some possible implementations, this is an example.

 


import fme
import fmeobjects
import re
def FeatureProcessor(feature):
    m = re.search(r'Bredd\s*(\d+\.?\d*)\s*m', feature.getAttribute('text'))
    if m:
        feature.setAttribute('substr', m.group(1))

The editor is not good. A backslash before the dot in the regex cannot be displayed. Please insert a backslash before the dot!

 


Bredd\s*(\d+\\.?\d*)\s*m

 

 


Note that the "re.search" method returns a MatchObject instance, not a matched substring.

 


See here to learn more about regex operations with Python.

 


https://docs.python.org/2.7/library/re.html#module...

 


 


Takashi
Badge
Hello,

 

 

I made some performance measurement of all proposed solutions and here are the numbers:

 

 

Test case Nb. of features Run 1 Run 2 Run 3 Average PythonCaller + string.find 1 000 000 20.5 20.3 20.1 20.3 PythonCaller + string.split 1 000 000 20.5 20.4 20.9 20.6 PythonCaller + regex 1 000 000 23.3 21.8 23.6 22.9 StringSearcher 1 000 000 32.7 32.0 32.4 32.4

 

Larry
Badge
Hi Peter,

 

 

try this:

 

 

nameOfYourStr="A=0,5m, B = Bredd 0 m, C= Reduktionstal"

 

start=nameOfYourStr.find('Bredd ')+6

 

end=nameOfYourStr.find(' m, C=')

 

substr=nameOfYourStr[start:end]

 

 

Yann

 

yanntt, thank you so much for your time. I am very grateful for this. Peter

Badge
Hi

 


 


You do not have to use Python for this, a simple StringSearcher will suffice. Use the expression

 


 

Bredd\s*([\d.]+)\s*m

And you will get your value as _matched_parts{0}

 

 

David

Thank you, david_r! I'm grateful that you took your time to answer this. Peter

Badge
Hi Peter,

 

 

In one line:

 

feature.setAttribute('substr', feature.getAttribute('text').split('Bredd')[1].split('m')
[0].strip())
And step by step:

 

#Get the text attribute
att = feature.getAttribute('text')
#Right part of Bredd
right = att.split('Bredd')[1]
#Left part of m
left = right.split('m')[0]
#Trim extra spaces
substr = left.strip()
feature.setAttribute('substr', substr)

 

Larry

Larry, thank you very much for this enlightening answer. Very much appreciated. Peter

Badge
Hi Peter,

 


 


I would also use the StringSearcher in this case. However, if you want to learn Python regex operations, it's also a good practice of course. There could be some possible implementations, this is an example.

 


import fme
import fmeobjects
import re
def FeatureProcessor(feature):
    m = re.search(r'Bredd\s*(\d+\.?\d*)\s*m', feature.getAttribute('text'))
    if m:
        feature.setAttribute('substr', m.group(1))

The editor is not good. A backslash before the dot in the regex cannot be displayed. Please insert a backslash before the dot!

 


Bredd\s*(\d+\\.?\d*)\s*m

 

 


Note that the "re.search" method returns a MatchObject instance, not a matched substring.

 


See here to learn more about regex operations with Python.

 


https://docs.python.org/2.7/library/re.html#module...

 


 


Takashi

Takashi, thank you very much . This was perfect and educating for me. I will use this and solve all my string related problems in a flash. Much appreciated. Peter

Badge
Hello,

 

 

I made some performance measurement of all proposed solutions and here are the numbers:

 

 

Test case Nb. of features Run 1 Run 2 Run 3 Average PythonCaller + string.find 1 000 000 20.5 20.3 20.1 20.3 PythonCaller + string.split 1 000 000 20.5 20.4 20.9 20.6 PythonCaller + regex 1 000 000 23.3 21.8 23.6 22.9 StringSearcher 1 000 000 32.7 32.0 32.4 32.4

 

Larry

Interesting, Larry, in this case I only have about a thousand objects, but anyways, interesting. Thanks! Peter

Userlevel 4
Hello,

 


 


I made some performance measurement of all proposed solutions and here are the numbers:

 


 






Test case


Nb. of features


Run 1


Run 2


Run 3


Average




PythonCaller + string.find


1 000 000


20.5


20.3


20.1


20.3




PythonCaller + string.split


1 000 000


20.5


20.4


20.9


20.6




PythonCaller + regex


1 000 000


23.3


21.8


23.6


22.9




StringSearcher


1 000 000


32.7


32.0


32.4


32.4




 


Larry


Nice analysis, very helpful. I would suspect that the StringSearcher will be faster starting FME 2016, as the've replaced the Tcl regex engine with something (hopefully) more efficient. As you can see, the StringSearcher currently calls the Tcl engine for each feature that enters, leading to quite a bit of overhead.


For the sake of performance, you could also use a pre-compiled regex, it shaves off a couple of seconds:



precomp_regex = re.compile(r'Bredd\s*(\d+\\.?\d*)\s*m')
def FeatureProcessor(feature):
    m = precomp_regex.search(feature.getAttribute('text'))
    if m:
        feature.setAttribute('substr', m.group(1))


David

Reply