Solved

Using the PythonCaller to extract substring between two known substrings

9 years ago
November 12, 2015
11 replies
21 views

peteralstorp
Contributor
91 replies

Hello, fellow FMEers.

I'm very new to Python and to the PythonCaller, but we all need to start somewhere. I have wrestled with this problem basically all day and I still don't get it to work.

I have a textstring "A=0,5m, B = Bredd 0 m, C= Reduktionstal". From this string I want to extract the value between "Bredd " and " m" - in this case it is 0, but in can be 10, it can be 10.2.

I'm attaching an image of my progress so far... And the PythonCaller script so far:

import fme

import fmeobjects

import re

def FeatureProcessor(feature):

substr = re.search('Bredd (.*) m', feature.getAttribute('text'))

feature.setAttribute("substr", substr)

I have often come across this problem and I really want to learn how to solve it. Pls help!

Peter

Best answer by takashi

Hi Peter,

I would also use the StringSearcher in this case. However, if you want to learn Python regex operations, it's also a good practice of course. There could be some possible implementations, this is an example.

import fme
import fmeobjects
import re
def FeatureProcessor(feature):
    m = re.search(r'Bredd\s*(\d+\.?\d*)\s*m', feature.getAttribute('text'))
    if m:
        feature.setAttribute('substr', m.group(1))

The editor is not good. A backslash before the dot in the regex cannot be displayed. Please insert a backslash before the dot!

Bredd\s*(\d+\\.?\d*)\s*m

Note that the "re.search" method returns a MatchObject instance, not a matched substring.

See here to learn more about regex operations with Python.

https://docs.python.org/2.7/library/re.html#module...

Takashi

View original

Did this help you find an answer to your question?

This post is closed to further activity.
It may be a question with a best answer, an implemented idea, or just a post needing no comment.
If you have a follow-up or related question, please post a new question or idea.
If there is a genuine update to be made, please contact us and request that the post is reopened.

yanntt
8 replies
9 years ago
November 12, 2015

Hi Peter,

try this:

nameOfYourStr="A=0,5m, B = Bredd 0 m, C= Reduktionstal"

start=nameOfYourStr.find('Bredd ')+6

end=nameOfYourStr.find(' m, C=')

substr=nameOfYourStr[start:end]

Yann

david_r
8355 replies
9 years ago
November 12, 2015

You do not have to use Python for this, a simple StringSearcher will suffice. Use the expression

Bredd\s*([\d.]+)\s*m

And you will get your value as _matched_parts{0}

David

larry
173 replies
9 years ago
November 12, 2015

Hi Peter,

In one line:

feature.setAttribute('substr', feature.getAttribute('text').split('Bredd')[1].split('m')
[0].strip())

And step by step:

#Get the text attribute
att = feature.getAttribute('text')
#Right part of Bredd
right = att.split('Bredd')[1]
#Left part of m
left = right.split('m')[0]
#Trim extra spaces
substr = left.strip()
feature.setAttribute('substr', substr)

Larry

takashi
7723 replies
Best Answer
9 years ago
November 13, 2015

Hi Peter,

import fme
import fmeobjects
import re
def FeatureProcessor(feature):
    m = re.search(r'Bredd\s*(\d+\.?\d*)\s*m', feature.getAttribute('text'))
    if m:
        feature.setAttribute('substr', m.group(1))

The editor is not good. A backslash before the dot in the regex cannot be displayed. Please insert a backslash before the dot!

Bredd\s*(\d+\\.?\d*)\s*m

Note that the "re.search" method returns a MatchObject instance, not a matched substring.

See here to learn more about regex operations with Python.

https://docs.python.org/2.7/library/re.html#module...

Takashi

larry
173 replies
9 years ago
November 13, 2015

Hello,

I made some performance measurement of all proposed solutions and here are the numbers:

Test case Nb. of features Run 1 Run 2 Run 3 Average PythonCaller + string.find 1 000 000 20.5 20.3 20.1 20.3 PythonCaller + string.split 1 000 000 20.5 20.4 20.9 20.6 PythonCaller + regex 1 000 000 23.3 21.8 23.6 22.9 StringSearcher 1 000 000 32.7 32.0 32.4 32.4

Larry

peteralstorp
Author
Contributor
91 replies
9 years ago
November 13, 2015

yanntt wrote:

Hi Peter,

try this:

nameOfYourStr="A=0,5m, B = Bredd 0 m, C= Reduktionstal"

start=nameOfYourStr.find('Bredd ')+6

end=nameOfYourStr.find(' m, C=')

substr=nameOfYourStr[start:end]

Yann

yanntt, thank you so much for your time. I am very grateful for this. Peter

peteralstorp
Author
Contributor
91 replies
9 years ago
November 13, 2015

david_r wrote:

You do not have to use Python for this, a simple StringSearcher will suffice. Use the expression

Bredd\s*([\d.]+)\s*m

And you will get your value as _matched_parts{0}

David

Thank you, david_r! I'm grateful that you took your time to answer this. Peter

peteralstorp
Author
Contributor
91 replies
9 years ago
November 13, 2015

larry wrote:

Hi Peter,

In one line:

feature.setAttribute('substr', feature.getAttribute('text').split('Bredd')[1].split('m')
[0].strip())

And step by step:

#Get the text attribute
att = feature.getAttribute('text')
#Right part of Bredd
right = att.split('Bredd')[1]
#Left part of m
left = right.split('m')[0]
#Trim extra spaces
substr = left.strip()
feature.setAttribute('substr', substr)

Larry

Larry, thank you very much for this enlightening answer. Very much appreciated. Peter

peteralstorp
Author
Contributor
91 replies
9 years ago
November 13, 2015

takashi wrote:

Hi Peter,

import fme
import fmeobjects
import re
def FeatureProcessor(feature):
    m = re.search(r'Bredd\s*(\d+\.?\d*)\s*m', feature.getAttribute('text'))
    if m:
        feature.setAttribute('substr', m.group(1))

The editor is not good. A backslash before the dot in the regex cannot be displayed. Please insert a backslash before the dot!

Bredd\s*(\d+\\.?\d*)\s*m

Note that the "re.search" method returns a MatchObject instance, not a matched substring.

See here to learn more about regex operations with Python.

https://docs.python.org/2.7/library/re.html#module...

Takashi

Takashi, thank you very much . This was perfect and educating for me. I will use this and solve all my string related problems in a flash. Much appreciated. Peter

peteralstorp
Author
Contributor
91 replies
9 years ago
November 13, 2015

larry wrote:

Hello,

I made some performance measurement of all proposed solutions and here are the numbers:

Larry

Interesting, Larry, in this case I only have about a thousand objects, but anyways, interesting. Thanks! Peter

david_r
8355 replies
9 years ago
November 13, 2015

larry wrote:

Hello,

I made some performance measurement of all proposed solutions and here are the numbers:

Test case

Nb. of features

Run 1

Run 2

Run 3

Average

PythonCaller + string.find

1 000 000

20.5

20.3

20.1

20.3

PythonCaller + string.split

1 000 000

20.5

20.4

20.9

20.6

PythonCaller + regex

1 000 000

23.3

21.8

23.6

22.9

StringSearcher

1 000 000

32.7

32.0

32.4

32.4

Larry

Nice analysis, very helpful. I would suspect that the StringSearcher will be faster starting FME 2016, as the've replaced the Tcl regex engine with something (hopefully) more efficient. As you can see, the StringSearcher currently calls the Tcl engine for each feature that enters, leading to quite a bit of overhead.

For the sake of performance, you could also use a pre-compiled regex, it shaves off a couple of seconds:


precomp_regex = re.compile(r'Bredd\s*(\d+\\.?\d*)\s*m')
def FeatureProcessor(feature):
    m = precomp_regex.search(feature.getAttribute('text'))
    if m:
        feature.setAttribute('substr', m.group(1))

David

Cookie policy

We use cookies to enhance and personalize your experience. If you accept you agree to our full cookie policy. Learn more about our cookies.

Cookie settings

We use 3 different kinds of cookies. You can choose which cookies you want to accept. We need basic cookies to make this site work, therefore these are the minimum you can select. Learn more about our cookies.

Basic
Functional

Normal
Functional + analytics

Complete
Functional + analytics + social media + embedded videos + marketing

Using the PythonCaller to extract substring between two known substrings

11 replies

Helpful Members This Week

Recently Solved Questions

RasterExpressionEvaluator Expression to select raster GRAY8 values

FME 2025.1 PythonCaller can't run arcpy?

Tag unknown # features with ID from a previous record

How to set a "reply_to" parameter in flow automation action "email send"

AttributeValidator Pass Nulls

Community Stats

Latest FME

Cookie policy

Cookie settings

Related Topics

One attribute to multiple using regExicon

Question of the Week: New User Learning Curvesicon

extracting a substring using python callericon

Calculate difference in hours between two stringsicon

Assign first vertex start as different km than 0 for snipper.icon

Helpful Members This Week

Recently Solved Questions

RasterExpressionEvaluator Expression to select raster GRAY8 values

FME 2025.1 PythonCaller can't run arcpy?

Tag unknown # features with ID from a previous record

How to set a "reply_to" parameter in flow automation action "email send"

AttributeValidator Pass Nulls

Popular Tags

Community Stats

Latest FME

Sign up

An FME Account is required to contribute

Login to the community

An FME Account is required to contribute

Scanning file for viruses.

This file cannot be downloaded

Cookie policy

Cookie settings