Question

FME Python script help


Badge

I have a dataset containing a registry of addresses,(street, number, postal code etc. , and an additional field containing information where there's input like for example 24-44 , meaning all the housenumbers from 24- 44 are also included, or - 44 etc.

I need to match these addresses with another official address dataset (through the featuremerger), and I want to be able to find matches for all the numbers that are between 24 and 44.

I've managed to "mine" the text in the additional field and I've created a new attribute containing the upper range number (in this case 44 )

Now, I just need to have features that have the housenumbers that fall between the lower and upper range, and for this I obviously need a python script that can be used within the "pythoncaller" transformer and then run them through the feature merger.

I know a little bit of Python, but the little bit that I know apparently isn't enough to make the translation to Python in FME.

I know I need to define a function that's able to register this number range, and with a for loop that states that for every number within that range a feature is created with and address containing housenumbers within that number range. The function also needs to keep track of even and odd numbers. If the upper and lower range aren't both odd, or both even, it can create an adress for every number within that range. If not, than it only needs to create features with either an odd or even housenumber.

I hope my explanation was clear enough. Can someone help me along the way? Writing the whole script isn't neccesary since I also like to figure it out on my own.


21 replies

Badge +16

Hi, in this sample the tool MakeReferenceAddress has a number of examples of PythonCaller being used to explode implied subaddress unit ranges of a number of types, your can easily adapt this to house numbers:

http://pm.maps.arcgis.com/home/item.html?id=67fd0eaa9aa24d9d97fd93aefe0585b8

I think I'm attaching the right source here:

makereferenceaddress.fmw

Beware if you need to respect parity the increment will be 2, not 1, in the code.

Badge

Hi @fmenco

Great choice in task as it covers quite a few python basics! 

#housenumbers is the column name
#housenumbers = 24-44
housenumbers = feature.getAttribute('housenumbers') 

#convert to a list Spilt = [24,44]
houseSplit = housenumbers.split("-")
if len(houseSplit) ==2:
# had to convert from text to int
StartNum = int(houseSplit[0]) #24
FinishNum = int(houseSplit[1]) #44

# test if StartNum and FinishNum or either both odd or even
# using modulo
if StartNum % 2 == FinishNum % 2:
# range(StartNum,FinishNum,stepsize)
# range(24,44,2)
for X in range(StartNum,FinishNum,2):
feature.setAttribute('newhousenumber',X)
#This outputs a row with all the other columns as well
# would create a new record for 24, 26,28,30... 42 44
self.pyoutput(feature)
else:
# if its  (24,27,1)
for X in range(StartNum,FinishNum,1):
feature.setAttribute('newhousenumber',X)
# would create a new record for 24,25,26,27
self.pyoutput(feature)
else:

feature.setAttribute('newhousenumber',houseSplit[0])
self.pyoutput(feature)

That should be the main bulk of the Input function

Remember to Expose the new colomn in this case "newhousenumber"

Badge +3

you can do that without any python or tcl. (or witohout SQL trough the inlinequerier.)

Use stringreplacer; for instance search for (\\d+)-(\\d+) on the attribute with AllMatches (AM) and submatches (SM)

Test Value(SM{0} and Value SM{1} to check parity: mod(SM{0},2) = 0 AND mod(SM{1},2) = 0 for parity 2 and rest for 1.

Divide

SM{1} -SM{0} by parity. Clone record by result, recalculate house number by

SM{0} + _copynum*@Value(parity).

For including house letters it easier to convert them to their character code.

Userlevel 4

Hi @fmenco

Great choice in task as it covers quite a few python basics! 

#housenumbers is the column name
#housenumbers = 24-44
housenumbers = feature.getAttribute('housenumbers') 

#convert to a list Spilt = [24,44]
houseSplit = housenumbers.split("-")
if len(houseSplit) ==2:
# had to convert from text to int
StartNum = int(houseSplit[0]) #24
FinishNum = int(houseSplit[1]) #44

# test if StartNum and FinishNum or either both odd or even
# using modulo
if StartNum % 2 == FinishNum % 2:
# range(StartNum,FinishNum,stepsize)
# range(24,44,2)
for X in range(StartNum,FinishNum,2):
feature.setAttribute('newhousenumber',X)
#This outputs a row with all the other columns as well
# would create a new record for 24, 26,28,30... 42 44
self.pyoutput(feature)
else:
# if its  (24,27,1)
for X in range(StartNum,FinishNum,1):
feature.setAttribute('newhousenumber',X)
# would create a new record for 24,25,26,27
self.pyoutput(feature)
else:

feature.setAttribute('newhousenumber',houseSplit[0])
self.pyoutput(feature)

That should be the main bulk of the Input function

Remember to Expose the new colomn in this case "newhousenumber"

Just a small comment: I seem to remember that each time you call self.pyoutput(), that feature object became undefined. It's therefore better to create a new instance of FMEFeature before each call to self.pyoutput().

 

Not 100% sure if that's still the case, however, but might be worth watching out for.
Badge +22

I admit I would probably do this in python myself, but I though I would try to do it in pure FME, and it's relatively straightforward (No looping required), though in production I would add extra checks for legitimate values.

 

kb-addresses.fmw

Badge

I admit I would probably do this in python myself, but I though I would try to do it in pure FME, and it's relatively straightforward (No looping required), though in production I would add extra checks for legitimate values.

 

kb-addresses.fmw

Hi,

 

I think I do need a for loop, since the ranges are random. The 24- 44 is just an example..., the dataset contains random ranges for certain records.

 

 

Userlevel 2
Badge +17

Hi @fmenco, I'm still unclear what your goal is.

Do you need to create all the values within the range, or need to test if an attribute value from a feature is in a specified range? Which is your purpose?

Badge
Just a small comment: I seem to remember that each time you call self.pyoutput(), that feature object became undefined. It's therefore better to create a new instance of FMEFeature before each call to self.pyoutput().

 

Not 100% sure if that's still the case, however, but might be worth watching out for.
Hi, 

 

Thank you.

 

I'm trying to adapt your script, since I have already managed to "mine" the column containing text with possible ranges of addresses, and place the "upper range" into a seperate column. But now I get a "PythonFactory  failed to process feature" error. What am I supposed to do with the FME script template (with commands like import fme, class FeatureProcessor(object): etc etc. ? How am I supposed to structure my script, to fit it in there? That's what I meant with, I can't make the transition/translation from "regular" python to "fme python"

 

 

  import fme import fmeobjects #Define StartNum and Finishnum StartNum = feature.getAttribute('housenumber') #24 FinishNum = feature.getAttribute(upper_range_number) #44  #The rest stays more or less the same, except for the last else statement in your script. For now, I've just edited that out.
# test if StartNum and FinishNum or either both odd or even
# using modulo
if StartNum % 2 == FinishNum % 2:
 # range(StartNum,FinishNum,stepsize)
 # range(24,44,2)
 for X in range(StartNum,FinishNum,2):
  feature.setAttribute('newhousenumber',X)
  #This outputs a row with all the other columns as well
  # would create a new record for 24, 26,28,30... 42 44
  self.pyoutput(feature)
else:
 # if its  (24,27,1)
 for X in range(StartNum,FinishNum,1):
  feature.setAttribute('newhousenumber',X)
  # would create a new record for 24,25,26,27
  self.pyoutput(feature)
#else:
 
 #feature.setAttribute('newhousenumber',houseSplit[0])
 #self.pyoutput(feature) 
Userlevel 4

Here's how to implement the code from @davidrich in a PythonCaller.

I've added the necessary boilerplate in the code:

import fmeobjects

class split_house_numbers():
    
    def input(self, feature):
        #housenumbers is the column name
        #housenumbers = 24-44
        housenumbers = feature.getAttribute('housenumbers') 
         
        #convert to a list Spilt = [24,44]
        houseSplit = housenumbers.split("-")
        if len(houseSplit) ==2:
            # had to convert from text to int
            StartNum = int(houseSplit[0]) #24
            FinishNum = int(houseSplit[1]) #44
            
            # test if StartNum and FinishNum or either both odd or even
            # using modulo
            if StartNum % 2 == FinishNum % 2:
                # range(StartNum,FinishNum,stepsize)
                # range(24,44,2)
                for X in range(StartNum,FinishNum,2):
                    new_feature = feature.clone()
                    new_feature.setAttribute('newhousenumber',X)
                    #This outputs a row with all the other columns as well
                    # would create a new record for 24, 26,28,30... 42 44
                    self.pyoutput(new_feature)
            else:
                # if its  (24,27,1)
                for X in range(StartNum,FinishNum,1):
                    new_feature = feature.clone()
                    new_feature.setAttribute('newhousenumber',X)
                    # would create a new record for 24,25,26,27
                    self.pyoutput(new_feature)
        else:
         
            feature.setAttribute('newhousenumber',houseSplit[0])
            self.pyoutput(feature)

Configure the PythonCaller as follows:

0684Q00000ArJZqQAN.png

If a feature enters containing housenumbers = '24-44', then 10 features will exit with the following values for newhousenumber:

0684Q00000ArJiXQAV.png

You can then pass these 10 features into e.g. the FeatureMerger.

Userlevel 2
Badge +17
Hi,

 

I think I do need a for loop, since the ranges are random. The 24- 44 is just an example..., the dataset contains random ranges for certain records.

 

 

In many cases, creating clones can be substitution of loop. I believe that @jdh's workflow helps you if your goal is to create values within a range. Just replace the Sample Data bookmark with a reader feature type that reads your random range definitions one by one.

 

Userlevel 4
Hi, 

 

Thank you.

 

I'm trying to adapt your script, since I have already managed to "mine" the column containing text with possible ranges of addresses, and place the "upper range" into a seperate column. But now I get a "PythonFactory  failed to process feature" error. What am I supposed to do with the FME script template (with commands like import fme, class FeatureProcessor(object): etc etc. ? How am I supposed to structure my script, to fit it in there? That's what I meant with, I can't make the transition/translation from "regular" python to "fme python"

 

 

  import fme import fmeobjects #Define StartNum and Finishnum StartNum = feature.getAttribute('housenumber') #24 FinishNum = feature.getAttribute(upper_range_number) #44  #The rest stays more or less the same, except for the last else statement in your script. For now, I've just edited that out.
# test if StartNum and FinishNum or either both odd or even
# using modulo
if StartNum % 2 == FinishNum % 2:
 # range(StartNum,FinishNum,stepsize)
 # range(24,44,2)
 for X in range(StartNum,FinishNum,2):
  feature.setAttribute('newhousenumber',X)
  #This outputs a row with all the other columns as well
  # would create a new record for 24, 26,28,30... 42 44
  self.pyoutput(feature)
else:
 # if its  (24,27,1)
 for X in range(StartNum,FinishNum,1):
  feature.setAttribute('newhousenumber',X)
  # would create a new record for 24,25,26,27
  self.pyoutput(feature)
#else:
 
 #feature.setAttribute('newhousenumber',houseSplit[0])
 #self.pyoutput(feature) 
I'm not @davidrich, but I posted a more complete version of the script above, including some instructions. All credit for the script goes to @davidrich.
Userlevel 4

Here's how to implement the code from @davidrich in a PythonCaller.

I've added the necessary boilerplate in the code:

import fmeobjects

class split_house_numbers():
    
    def input(self, feature):
        #housenumbers is the column name
        #housenumbers = 24-44
        housenumbers = feature.getAttribute('housenumbers') 
         
        #convert to a list Spilt = [24,44]
        houseSplit = housenumbers.split("-")
        if len(houseSplit) ==2:
            # had to convert from text to int
            StartNum = int(houseSplit[0]) #24
            FinishNum = int(houseSplit[1]) #44
            
            # test if StartNum and FinishNum or either both odd or even
            # using modulo
            if StartNum % 2 == FinishNum % 2:
                # range(StartNum,FinishNum,stepsize)
                # range(24,44,2)
                for X in range(StartNum,FinishNum,2):
                    new_feature = feature.clone()
                    new_feature.setAttribute('newhousenumber',X)
                    #This outputs a row with all the other columns as well
                    # would create a new record for 24, 26,28,30... 42 44
                    self.pyoutput(new_feature)
            else:
                # if its  (24,27,1)
                for X in range(StartNum,FinishNum,1):
                    new_feature = feature.clone()
                    new_feature.setAttribute('newhousenumber',X)
                    # would create a new record for 24,25,26,27
                    self.pyoutput(new_feature)
        else:
         
            feature.setAttribute('newhousenumber',houseSplit[0])
            self.pyoutput(feature)

Configure the PythonCaller as follows:

0684Q00000ArJZqQAN.png

If a feature enters containing housenumbers = '24-44', then 10 features will exit with the following values for newhousenumber:

0684Q00000ArJiXQAV.png

You can then pass these 10 features into e.g. the FeatureMerger.

Here's also a sample workspace: housenumbers.fmw
Badge

Hi @fmenco, I'm still unclear what your goal is.

Do you need to create all the values within the range, or need to test if an attribute value from a feature is in a specified range? Which is your purpose?

I have a dataset with addresses that looks like this:

 

 

 

 

I need to match this dataset with another address dataset, which is an official address dataset, and which of course doesn't contain all the random comments that are now contained within one of the "address" colomns in dataset A.

 

 

Right now Dataset A doesn't contain records for the "untill addresses", and I need to either create them or find a way in which those addresses can still be matched with dataset B.

 

I've managed to "mine" the letter column and get the upper range in a separate column (upper_range_column)

 

So for example now I can find matches in dataset B for fmestreet 1A, fmestreet 3, fmestreet 11 etc. through implementing several feature mergers in my workbench.

 

However, I need to create records for fmestreet 5, 7, 9 etc (only the oneven numbers in the range 3- 11 etc.)

 

I figured I needed a Python script for that.

 

 

Badge +22
Hi,

 

I think I do need a for loop, since the ranges are random. The 24- 44 is just an example..., the dataset contains random ranges for certain records.

 

 

This workspace determines the range based on the value of an attribute. It will produces the appropriate addresses for each input feature.

 

 

Userlevel 2
Badge +17

Hi @fmenco, I'm still unclear what your goal is.

Do you need to create all the values within the range, or need to test if an attribute value from a feature is in a specified range? Which is your purpose?

I think the Python script provided by them or @jdh's workflow could be applied to create individual number from these ranges in your example.

 

NumberLetterRequired Numbers3-113, 5, 7, 9, 1120-3320, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33How do you need to process these cases? Are there any other rules?

 

NumberLetterRequired Numbers?1A1315,17 included2untill 10
Userlevel 2
Badge +17

Hi @fmenco, I'm still unclear what your goal is.

Do you need to create all the values within the range, or need to test if an attribute value from a feature is in a specified range? Which is your purpose?

The attachment contains a workspace example that demonstrates to extract individual numbers within the ranges shown in your example - [3, 11] (odd-odd), [20, 33] (even-odd).

 

extract-hosenumbers-within-range.fmwt (FME 2017.1.2.1)

 

 

Badge
The attachment contains a workspace example that demonstrates to extract individual numbers within the ranges shown in your example - [3, 11] (odd-odd),  [20, 33] (even-odd).

 

extract-hosenumbers-within-range.fmwt (FME 2017.1.2.1)

 

 

import fmeobjects
class house_numbers():     def input(self, feature):   StartNum = feature.getAttribute('housenumber')   StartNum = int(StartNum)   FinishNum = feature.getAttribute('upper_range_number')   FinishNum = int(FinishNum)     if StartNum % 2 == FinishNum % 2:   for X in range(StartNum,FinishNum,2):   new_feature = feature.clone()   new_feature.setAttribute('newhousenumber',X)   self.pyoutput(new_feature)     else:   for X in range(StartNum,FinishNum,1):   new_feature = feature.clone()   new_feature.setAttribute('newhousenumber',X)   self.pyoutput(new_feature) 

 

Hi @takashi 

I would really prefer to use Python since my workbench is already relatively large and going the other route would add even more transformers.

 

I had already managed to create the "upper_range_number" attribute through applying regular expressions in an attribute creator transformer, before I posted the question, so my initial dataset looks like this now.

 

0684Q00000ArMyyQAF.jpg

Now I just need a way to match the numbers in between, and I've adapted the script provided in here by @davidrich  and @david_r (see my adapted version above in this post),  but I get an error "PythonFactory failed to process feature".  My indents look funny when posting the script on here, but in FME they are fine.

 

 

 

 

Userlevel 4
import fmeobjects
class house_numbers():     def input(self, feature):   StartNum = feature.getAttribute('housenumber')   StartNum = int(StartNum)   FinishNum = feature.getAttribute('upper_range_number')   FinishNum = int(FinishNum)     if StartNum % 2 == FinishNum % 2:   for X in range(StartNum,FinishNum,2):   new_feature = feature.clone()   new_feature.setAttribute('newhousenumber',X)   self.pyoutput(new_feature)     else:   for X in range(StartNum,FinishNum,1):   new_feature = feature.clone()   new_feature.setAttribute('newhousenumber',X)   self.pyoutput(new_feature) 

 

Hi @takashi 

I would really prefer to use Python since my workbench is already relatively large and going the other route would add even more transformers.

 

I had already managed to create the "upper_range_number" attribute through applying regular expressions in an attribute creator transformer, before I posted the question, so my initial dataset looks like this now.

 

0684Q00000ArMyyQAF.jpg

Now I just need a way to match the numbers in between, and I've adapted the script provided in here by @davidrich  and @david_r (see my adapted version above in this post),  but I get an error "PythonFactory failed to process feature".  My indents look funny when posting the script on here, but in FME they are fine.

 

 

 

 

If you get an error like that, please post the entire error message from the log window. The message you posted doesn't really tell us much about why it fails.

 

However, I'm going to guess that it's something to do with "upper_range_number" not always being defined in your initial dataset. Try filling out the missing values and see if that makes a difference.

 

Badge
The attachment contains a workspace example that demonstrates to extract individual numbers within the ranges shown in your example - [3, 11] (odd-odd),  [20, 33] (even-odd).

 

extract-hosenumbers-within-range.fmwt (FME 2017.1.2.1)

 

 

Hi @fmenco I agree with @david_r that "upper_range_number" will cause a problem as its would be a null value. try something like this: 

 

 

import fmeobjects
class house_numbers():
def input(self, feature):
StartNum = feature.getAttribute('housenumber')
StartNum = int(StartNum)
FinishNumText = feature.getAttribute('upper_range_number')

# only does this if the FinishNumText has a value
if FinishNumText:
FinishNum = int(FinishNumText)

if StartNum % 2 == FinishNum % 2:
for X in range(StartNum,FinishNum,2):
new_feature = feature.clone()
new_feature.setAttribute('newhousenumber',X)
self.pyoutput(new_feature)

else:
for X in range(StartNum,FinishNum,1):
new_feature = feature.clone()
new_feature.setAttribute('newhousenumber',X)
self.pyoutput(new_feature)
else:
self.pyoutput(feature)

Badge
Hi @fmenco I agree with @david_r that "upper_range_number" will cause a problem as its would be a null value. try something like this: 

 

 

import fmeobjects
class house_numbers():
def input(self, feature):
StartNum = feature.getAttribute('housenumber')
StartNum = int(StartNum)
FinishNumText = feature.getAttribute('upper_range_number')

# only does this if the FinishNumText has a value
if FinishNumText:
FinishNum = int(FinishNumText)

if StartNum % 2 == FinishNum % 2:
for X in range(StartNum,FinishNum,2):
new_feature = feature.clone()
new_feature.setAttribute('newhousenumber',X)
self.pyoutput(new_feature)

else:
for X in range(StartNum,FinishNum,1):
new_feature = feature.clone()
new_feature.setAttribute('newhousenumber',X)
self.pyoutput(new_feature)
else:
self.pyoutput(feature)

Hi @david_r and @davidrich

 

I fixed it, .. the error wasn't due to the upper range containing missing or null since I had already added a tester to filter for these. It was just a stupid mistake (wrong attribute name).

 

Anyway, it does run now... HOWEVER...., the pythoncaller keeps on creating features it seems, long after the last record has entered the transformer. I Have run the workbench with breakpoints etc, and it does what I want it to do initially. But after that it just doesn't stop, and I don't have that many additional addresses. At one point it was at 6 million new features (from about 500...lol).

 

 

I think it's because it's also creating features across records with this script, if you can understand what I mean. Shouldn't we apply indices? Take a look at record 3 in the image of my example data, (fmestreet 13 untill 21). In the conditions for this script, is the check only done to create records within this range? Or is it possible that for fmestreet 13, also the range fmestreet 13- 33 ( the upper range from the 4th record) is considered?

 

I hope you guys can understand what I mean, sorry

 

Badge
Hi @david_r and @davidrich

 

I fixed it, .. the error wasn't due to the upper range containing missing or null since I had already added a tester to filter for these. It was just a stupid mistake (wrong attribute name).

 

Anyway, it does run now... HOWEVER...., the pythoncaller keeps on creating features it seems, long after the last record has entered the transformer. I Have run the workbench with breakpoints etc, and it does what I want it to do initially. But after that it just doesn't stop, and I don't have that many additional addresses. At one point it was at 6 million new features (from about 500...lol).

 

 

I think it's because it's also creating features across records with this script, if you can understand what I mean. Shouldn't we apply indices? Take a look at record 3 in the image of my example data, (fmestreet 13 untill 21). In the conditions for this script, is the check only done to create records within this range? Or is it possible that for fmestreet 13, also the range fmestreet 13- 33 ( the upper range from the 4th record) is considered?

 

I hope you guys can understand what I mean, sorry

 

the Code only run per record so there is no chance of other records affecting it,

 

 

Best bet is just to run with just a few records and check it all working fine to error fix

 

Reply