Skip to main content
Question

split attribute with delimiter but also ignore the same delimiter

  • February 18, 2019
  • 6 replies
  • 161 views

rva1
Contributor
Forum|alt.badge.img+1
  • Contributor
  • 15 replies

Hello,

 

i'm trying to split an attribute into a list with the attributesplitter using a delimiter.

The problem is the same delimiter must sometimes be ignored.

 

The atribute i'm trying to split is:

'3Ku_PFwo2YcPZnv0WeDVdW',#25,$,$,#113, (#13017,#14678,#19207,#20344,#20598,#20727,#20858)

 

The list values should be:

0: 3Ku_PFwo2YcPZnv0WeDVdW

1: #25

2: $

3: $

4: #113

5: #13017,#14678,#19207,#20344,#20598,#20727,#20858

 

when using the delimiter "," the last values will also be split...

The problem also is that the format of the attribute differ each time. So sometimes it could be: $,$,$,($,$,$,),$,$,($,$,$),$ or $,$,$,$ of ($,$,$),($,$,$),$,($) etc...

The order of the list must remain the same as the original order.

Any suggestion would be more than welcome;)

 

cheers,

ronald

 

This post is closed to further activity.
It may be an old question, an answered question, an implemented idea, or a notification-only post.
Please check post dates before relying on any information in a question or answer.
For follow-up or related questions, please post a new question or idea.
If there is a genuine update to be made, please contact us and request that the post is reopened.

6 replies

redgeographics
Celebrity
Forum|alt.badge.img+59
  • Celebrity
  • 3700 replies
  • February 18, 2019

Well... what you could try is use a regex to find anything between ( and ) and store that in a substring, replace the , in that substring with something else, put it all back in the main string, split that and then replace the something else back to a , (although if the formate is dynamic that might be difficult too)


itay
Supporter
Forum|alt.badge.img+18
  • Supporter
  • 1442 replies
  • February 18, 2019

Hi @rva1,

The fact the the attribute value will change makes it quite difficult to come up with a completly automated solution.

There are many ways to go about it, I would try the following:

  1. Search for everything in between the parentheses
  2. and use that to erase it from the original value.
  3. remove parentheses from substring
  4. Split the remaining value and use the substring

Hope this helps,

Itay


rva1
Contributor
Forum|alt.badge.img+1
  • Author
  • Contributor
  • 15 replies
  • February 19, 2019

Hi @rva1,

The fact the the attribute value will change makes it quite difficult to come up with a completly automated solution.

There are many ways to go about it, I would try the following:

  1. Search for everything in between the parentheses
  2. and use that to erase it from the original value.
  3. remove parentheses from substring
  4. Split the remaining value and use the substring

Hope this helps,

Itay

been a little work finding the right regex... but it seem \\(#(.*?)\\)\\)|\\(#(.*?)\\) is working...

the problem was that sometime i had to correct $,$,($,$,$),$,($,($)).... ;)

so i'm using a pipeline to get both

i also used the all matches list names to get all results... hope it's waterproof...


ebygomm
Influencer
Forum|alt.badge.img+44
  • Influencer
  • 3427 replies
  • February 20, 2019

I'd probably use a string replacer with regex to replace all commas outside of the string with another character that's not going to cause a conflict elsewhere, then use that character to split the string in an attribute splitter.

,\s*(?![^()]*\))

Or use python to split at commas outside the brakcets

import fme
import fmeobjects
import re

# Template Function interface:
# When using this function, make sure its name is set as the value of
# the 'Class or Function to Process Features' transformer parameter
def splitString(feature):
    string = feature.getAttribute('string')
    split_string = re.split(r',\s*(?![^()]*\))', string)
    for i, val in enumerate(split_string):
        feature.setAttribute('string{'+str(i)+'}.split',val)

You'd still need to remove the brackets themselves in either option


rva1
Contributor
Forum|alt.badge.img+1
  • Author
  • Contributor
  • 15 replies
  • February 20, 2019

I'd probably use a string replacer with regex to replace all commas outside of the string with another character that's not going to cause a conflict elsewhere, then use that character to split the string in an attribute splitter.

,\s*(?![^()]*\))

Or use python to split at commas outside the brakcets

import fme
import fmeobjects
import re

# Template Function interface:
# When using this function, make sure its name is set as the value of
# the 'Class or Function to Process Features' transformer parameter
def splitString(feature):
    string = feature.getAttribute('string')
    split_string = re.split(r',\s*(?![^()]*\))', string)
    for i, val in enumerate(split_string):
        feature.setAttribute('string{'+str(i)+'}.split',val)

You'd still need to remove the brackets themselves in either option

ahh very nice!

this helps with some other issues as well

tnx!


Forum|alt.badge.img+2
  • 1891 replies
  • February 22, 2019

@rva1 Leveraging @egomm 's great regular expression to replace the comma delimiter with something else, i.e. a pipe (|) character, here's an equivalent workspace (2018.1): for those of us not so comfortable in Python!

stringsplitter.zip