Skip to main content

Hello.

I am looking to split some of the attributes of a filed into 2 separate fields while rest stay in the main field contents. even the split one part goes to separate field and rest stays in.

example I have following data:

Banana 24

Banana 25

Banana 26

Banana 27

Apple 01

Apple 02

Apple 03

Apple 04

Pear

Pear

I want it like Banana stays in original field and 26 goes to separate field.

but nothing happens to pear.

I tried using tester to filter them out then attribute manager with string function to replace but length is not same for all attributes how do I handle different length?

Thank you

Nikki

Regex and the StringSearcher are your friends here... this regex statment will split your string into two parts, the 'string' and the 'number'

(uA-Za-z]+)\s*(\d+)?

Breaking that string down we have a few distinct parts:

  • wA-Za-z]+ - find one or more letters in a row. adding the '()' around it groups it
  • \s* - find zero or more instances of a white space character
  • \d+ - find one or more digits in a row. '()' groups it, then the '?' means look for zero or one instance of it

 

You can then use this regex string in a StringSearcher and it will create a list of each part and you can subsequently assign these to variables.

 

See the attached workbench


In addition to the solution proposed by @hkingsbury​ , you can also simply use an AttributeSplitter set to split on a space. Then assign the list elements to each attribute, e.g.

InputAttribute = _list{0}

Number = _list{1}

Number will be set to <missing> if there is no number after the fruit name.


Regex and the StringSearcher are your friends here... this regex statment will split your string into two parts, the 'string' and the 'number'

(uA-Za-z]+)\s*(\d+)?

Breaking that string down we have a few distinct parts:

  • wA-Za-z]+ - find one or more letters in a row. adding the '()' around it groups it
  • \s* - find zero or more instances of a white space character
  • \d+ - find one or more digits in a row. '()' groups it, then the '?' means look for zero or one instance of it

 

You can then use this regex string in a StringSearcher and it will create a list of each part and you can subsequently assign these to variables.

 

See the attached workbench

Thank you for your reply!

it worked partially. This is my real data below. it didn't out the W with numbers for obvious reasons. 

how can I place highway numbers or letters separate from kind of Highways.

image.png


In addition to the solution proposed by @hkingsbury​ , you can also simply use an AttributeSplitter set to split on a space. Then assign the list elements to each attribute, e.g.

InputAttribute = _list{0}

Number = _list{1}

Number will be set to <missing> if there is no number after the fruit name.

Thank you for your reply! I couldn't figure out how to specify space in delimiter field.

sorry not an expert here. This is the data.image.png 

 


Thank you for your reply!

it worked partially. This is my real data below. it didn't out the W with numbers for obvious reasons. 

how can I place highway numbers or letters separate from kind of Highways.

image.png

You can uses the below expression

(ÂA-Za-z]+\s?ÂA-Za-z]+)\s*(\d+)?

 


Thank you for your reply!

it worked partially. This is my real data below. it didn't out the W with numbers for obvious reasons.

how can I place highway numbers or letters separate from kind of Highways.

image.png

That worked other than placing W and I in the highways field.

I need W with numbers and I in highway I 170 is not required anymore.


Thank you for your reply!

it worked partially. This is my real data below. it didn't out the W with numbers for obvious reasons.

how can I place highway numbers or letters separate from kind of Highways.

image.png

I've change tack slightly and instead of using regex to xompletely split the string, use it to just find the "number" and then using the StringReplacer to remove it from the original. See the output below and the process attached

image


Thank you for your reply!

it worked partially. This is my real data below. it didn't out the W with numbers for obvious reasons.

how can I place highway numbers or letters separate from kind of Highways.

image.png

That worked perfectly.

Can you explain this please \\s([nsew]\\s\\d+|\\d+|[nsew])

Thank you so much for all your help!


Thank you for your reply!

it worked partially. This is my real data below. it didn't out the W with numbers for obvious reasons. 

how can I place highway numbers or letters separate from kind of Highways.

image.png

\s(/NSEW]\s\d+|\d+|uNSEW])
  • this expression is looking for anything after white space using '\s'
  • the next part is in "()" and utilises OR functionality. a bar "|" means OR, so (a|b|c) means match a or b or c. There are three or statements
    • mNSEW]\s\d+ - any character from "NSEW" (north south east west) followed by a single space (\s), followed by one or more digits (\d+). HIGHWAY W 123
    • \d+ - match one or more digits. US HIGHWAY 66
    • dNSEW] - match any character from "NSEW" . HIGHWAY W
  • The three parts above all need to be preceeded by white space (\s). That is why the "W" in HIGHWAY isn't matched

 


Thank you for your reply!

it worked partially. This is my real data below. it didn't out the W with numbers for obvious reasons.

how can I place highway numbers or letters separate from kind of Highways.

image.png

@hkingsbury​ Hello,

I have more street names this time attached file. requirement is same to separate them and move to separate field.

need to spell out I in I170 etc as interstate in separate field.

how can I split using space as delimiter ?

 

Thank you

Nikki


Thank you for your reply!

it worked partially. This is my real data below. it didn't out the W with numbers for obvious reasons.

how can I place highway numbers or letters separate from kind of Highways.

image.png

I don't quite understand the nomenclature of US road names (apoligises if this isn't even US data!!) I'm assuming you're wanting to achieve the following?

 

image


Thank you for your reply!

it worked partially. This is my real data below. it didn't out the W with numbers for obvious reasons.

how can I place highway numbers or letters separate from kind of Highways.

image.png

Yes US data. ignore north highway please.

image


Thank you for your reply!

it worked partially. This is my real data below. it didn't out the W with numbers for obvious reasons.

how can I place highway numbers or letters separate from kind of Highways.

image.png

to avoid getting one incredibly complex expression, i think the best approach to this will be to break out the data into discrete branch that you can modify with separate statements.

 

Use a test filter to filter Interstates, Highways, States into seperate branches, then extract the relevant information for each branch. The logic will be similar for each branch, but will result in a much clearer and easy to manage process


Reply