Skip to main content

Hello,

 

i'm trying to split an attribute into a list with the attributesplitter using a delimiter.

The problem is the same delimiter must sometimes be ignored.

 

The atribute i'm trying to split is:

'3Ku_PFwo2YcPZnv0WeDVdW',#25,$,$,#113, (#13017,#14678,#19207,#20344,#20598,#20727,#20858)

 

The list values should be:

0: 3Ku_PFwo2YcPZnv0WeDVdW

1: #25

2: $

3: $

4: #113

5: #13017,#14678,#19207,#20344,#20598,#20727,#20858

 

when using the delimiter "," the last values will also be split...

The problem also is that the format of the attribute differ each time. So sometimes it could be: $,$,$,($,$,$,),$,$,($,$,$),$ or $,$,$,$ of ($,$,$),($,$,$),$,($) etc...

The order of the list must remain the same as the original order.

Any suggestion would be more than welcome;)

 

cheers,

ronald

 

Well... what you could try is use a regex to find anything between ( and ) and store that in a substring, replace the , in that substring with something else, put it all back in the main string, split that and then replace the something else back to a , (although if the formate is dynamic that might be difficult too)


Hi @rva1,

The fact the the attribute value will change makes it quite difficult to come up with a completly automated solution.

There are many ways to go about it, I would try the following:

  1. Search for everything in between the parentheses
  2. and use that to erase it from the original value.
  3. remove parentheses from substring
  4. Split the remaining value and use the substring

Hope this helps,

Itay


Hi @rva1,

The fact the the attribute value will change makes it quite difficult to come up with a completly automated solution.

There are many ways to go about it, I would try the following:

  1. Search for everything in between the parentheses
  2. and use that to erase it from the original value.
  3. remove parentheses from substring
  4. Split the remaining value and use the substring

Hope this helps,

Itay

been a little work finding the right regex... but it seem \\(#(.*?)\\)\\)|\\(#(.*?)\\) is working...

the problem was that sometime i had to correct $,$,($,$,$),$,($,($)).... ;)

so i'm using a pipeline to get both

i also used the all matches list names to get all results... hope it's waterproof...


I'd probably use a string replacer with regex to replace all commas outside of the string with another character that's not going to cause a conflict elsewhere, then use that character to split the string in an attribute splitter.

,\s*(?!i^()]*\))

Or use python to split at commas outside the brakcets

import fme
import fmeobjects
import re

# Template Function interface:
# When using this function, make sure its name is set as the value of
# the 'Class or Function to Process Features' transformer parameter
def splitString(feature):
    string = feature.getAttribute('string')
    split_string = re.split(r',\s*(?!t^()]*\))', string)
    for i, val in enumerate(split_string):
        feature.setAttribute('string{'+str(i)+'}.split',val)

You'd still need to remove the brackets themselves in either option


I'd probably use a string replacer with regex to replace all commas outside of the string with another character that's not going to cause a conflict elsewhere, then use that character to split the string in an attribute splitter.

,\s*(?!i^()]*\))

Or use python to split at commas outside the brakcets

import fme
import fmeobjects
import re

# Template Function interface:
# When using this function, make sure its name is set as the value of
# the 'Class or Function to Process Features' transformer parameter
def splitString(feature):
    string = feature.getAttribute('string')
    split_string = re.split(r',\s*(?!t^()]*\))', string)
    for i, val in enumerate(split_string):
        feature.setAttribute('string{'+str(i)+'}.split',val)

You'd still need to remove the brackets themselves in either option

ahh very nice!

this helps with some other issues as well

tnx!


@rva1 Leveraging @egomm 's great regular expression to replace the comma delimiter with something else, i.e. a pipe (|) character, here's an equivalent workspace (2018.1): for those of us not so comfortable in Python!

stringsplitter.zip


Reply