Skip to main content
Solved

Remove multiple underscores that separates words

  • August 18, 2017
  • 10 replies
  • 586 views

geospatiallover
Participant
Forum|alt.badge.img+6

I'd like my regex to replace the middle underscores with a space. I cannot seem to escape the underscore character to use with curly brackets for repetitions.

I have strings with multiple underscores and spaces as word separators. I'd like my result to look like below:

 

word1_worda wordb wordc wordd_word3

Sample strings look like:

 

sample1: Autobooster 1_Autobooster Unit_Unit_Autobooster Unit

 

sample2:Anchor_FIELD_VERIFIED_DATE_NonDisplay

With my format these samples should result to:

 

sample1: Autobooster 1_Autobooster Unit Unit_Autobooster Unit

 

sample2:Anchor_FIELD VERIFIED DATE_NonDisplay

I used Lookahead and lookbehind and what I'm getting is the middle string with the underscores between word 1 and word 2 and word 2 and word 3 before and after the result. Regex below

\\w(?<=_).*_?\\w(?<=_)

Test string looks like below

If I were to finish my translation I have to use StringReplacer, StringConcatenator, and then merge them back again.

Best answer by courtney_m

I was able to do this with 2 StringSearchers and an AttributeManager:

The first StringSearcher finds the first word and underscore by searching the text for the Regex ^[^_]*_ and saving it as _first_word:

The second string searched finds the last word and underscore by searching the text for the RegEx _[^_]*$ and saving it as _last_word:

Then, in the attribute manager, I created the _middle_text attribute by trimming _first_word off the left of the text, trimming _last_work off the right of the text, then replacing _ with a space. I used the following notation:

@ReplaceString(@TrimRight(@TrimLeft(@Value(text_line_data),@Value(_first_word)),@Value(_last_word)),"_"," ")

Then, I created the final_text attribute by concatenating the attributes _first_word, _middle_text, and _last_word. Finally, I removed the un-needed attributes.

From inspector, you can see what the value of final_text is....

I have also attached the workspace, if you want it. I hope this helps!

 

-Courtney

This post is closed to further activity.
It may be an old question, an answered question, an implemented idea, or a notification-only post.
Please check post dates before relying on any information in a question or answer.
For follow-up or related questions, please post a new question or idea.
If there is a genuine update to be made, please contact us and request that the post is reopened.

10 replies

ebygomm
Influencer
Forum|alt.badge.img+46
  • Influencer
  • August 18, 2017

So you want all but the first and last underscores replaced with spaces?


salvaleonrp
Enthusiast
Forum|alt.badge.img+20
  • Enthusiast
  • August 18, 2017

So you want all but the first and last underscores replaced with spaces?

Yes, that's the ideal.

 


courtney_m
Contributor
Forum|alt.badge.img+20
  • Contributor
  • Best Answer
  • August 18, 2017

I was able to do this with 2 StringSearchers and an AttributeManager:

The first StringSearcher finds the first word and underscore by searching the text for the Regex ^[^_]*_ and saving it as _first_word:

The second string searched finds the last word and underscore by searching the text for the RegEx _[^_]*$ and saving it as _last_word:

Then, in the attribute manager, I created the _middle_text attribute by trimming _first_word off the left of the text, trimming _last_work off the right of the text, then replacing _ with a space. I used the following notation:

@ReplaceString(@TrimRight(@TrimLeft(@Value(text_line_data),@Value(_first_word)),@Value(_last_word)),"_"," ")

Then, I created the final_text attribute by concatenating the attributes _first_word, _middle_text, and _last_word. Finally, I removed the un-needed attributes.

From inspector, you can see what the value of final_text is....

I have also attached the workspace, if you want it. I hope this helps!

 

-Courtney


salvaleonrp
Enthusiast
Forum|alt.badge.img+20
  • Enthusiast
  • August 18, 2017

I was able to do this with 2 StringSearchers and an AttributeManager:

The first StringSearcher finds the first word and underscore by searching the text for the Regex ^[^_]*_ and saving it as _first_word:

The second string searched finds the last word and underscore by searching the text for the RegEx _[^_]*$ and saving it as _last_word:

Then, in the attribute manager, I created the _middle_text attribute by trimming _first_word off the left of the text, trimming _last_work off the right of the text, then replacing _ with a space. I used the following notation:

@ReplaceString(@TrimRight(@TrimLeft(@Value(text_line_data),@Value(_first_word)),@Value(_last_word)),"_"," ")

Then, I created the final_text attribute by concatenating the attributes _first_word, _middle_text, and _last_word. Finally, I removed the un-needed attributes.

From inspector, you can see what the value of final_text is....

I have also attached the workspace, if you want it. I hope this helps!

 

-Courtney

Thanks @courtney_m for explaining and providing your workspace. I appreciate that.

 


danilo_fme
Celebrity
Forum|alt.badge.img+52
  • Celebrity
  • August 18, 2017

I was able to do this with 2 StringSearchers and an AttributeManager:

The first StringSearcher finds the first word and underscore by searching the text for the Regex ^[^_]*_ and saving it as _first_word:

The second string searched finds the last word and underscore by searching the text for the RegEx _[^_]*$ and saving it as _last_word:

Then, in the attribute manager, I created the _middle_text attribute by trimming _first_word off the left of the text, trimming _last_work off the right of the text, then replacing _ with a space. I used the following notation:

@ReplaceString(@TrimRight(@TrimLeft(@Value(text_line_data),@Value(_first_word)),@Value(_last_word)),"_"," ")

Then, I created the final_text attribute by concatenating the attributes _first_word, _middle_text, and _last_word. Finally, I removed the un-needed attributes.

From inspector, you can see what the value of final_text is....

I have also attached the workspace, if you want it. I hope this helps!

 

-Courtney

That is a powerful article. Great! @courtney_m

 


courtney_m
Contributor
Forum|alt.badge.img+20
  • Contributor
  • August 18, 2017
That is a powerful article. Great! @courtney_m

 

Thank you, @danilo_inovacao!

courtney_m
Contributor
Forum|alt.badge.img+20
  • Contributor
  • August 18, 2017
Thanks @courtney_m for explaining and providing your workspace. I appreciate that.

 

You're very welcome, @salvaleonrp. I'm glad I could help.

 


salvaleonrp
Enthusiast
Forum|alt.badge.img+20
  • Enthusiast
  • August 18, 2017

The accepted answer is good enough but I wonder if there's a single regex string to remove the extra underscores. Any takers?


takashi
Celebrity
  • August 19, 2017

@salvaleonrp, accept your challenge: "single regex string to remove the extra underscores"

[2017-08-20: Update] Simplified the regex.

Use a StringReplacer with these parameters.

  • Mode: Replace Regular Expression
  • Text To Replace: (?<=_)(.*?)_(?=.*_)
  • Replacement Text: \1<space>

This string expression set to a transformer parameter works as well. Assume a feature attribute called "text" contains the source text string.

@ReplaceRegEx(@Value(text),(?<=_)(.*?)_(?=.*_),\1 )

Another thought:

1. StringSearther: Split the source text into 3 parts.

  • Contains Regular Expression: ^(.*?_)(.*_.*)(_.*)$
  • Subexpression Matches List Name: _sub

2. StringReplacer: Replace every underscore in the middle part with space.

  • Attributes: _sub{1}.part
  • Mode: Replace Text
  • Text To Replace: _
  • Replacement Text: <space>

3. StringConcatenator etc.: Simply concatenate the three elements of "_sub{}.part" list.

@Value(_sub{0}.part)@Value(_sub{1}.part)@Value(_sub{2}.part)

The replacement and concatenation can also be performed with a single string expression.

@Value(_sub{0}.part)@ReplaceString(@Value(_sub{1}.part),_," ")@Value(_sub{2}.part)

salvaleonrp
Enthusiast
Forum|alt.badge.img+20
  • Enthusiast
  • August 21, 2017

@salvaleonrp, accept your challenge: "single regex string to remove the extra underscores"

[2017-08-20: Update] Simplified the regex.

Use a StringReplacer with these parameters.

  • Mode: Replace Regular Expression
  • Text To Replace: (?<=_)(.*?)_(?=.*_)
  • Replacement Text: \1<space>

This string expression set to a transformer parameter works as well. Assume a feature attribute called "text" contains the source text string.

@ReplaceRegEx(@Value(text),(?<=_)(.*?)_(?=.*_),\1 )

Another thought:

1. StringSearther: Split the source text into 3 parts.

  • Contains Regular Expression: ^(.*?_)(.*_.*)(_.*)$
  • Subexpression Matches List Name: _sub

2. StringReplacer: Replace every underscore in the middle part with space.

  • Attributes: _sub{1}.part
  • Mode: Replace Text
  • Text To Replace: _
  • Replacement Text: <space>

3. StringConcatenator etc.: Simply concatenate the three elements of "_sub{}.part" list.

@Value(_sub{0}.part)@Value(_sub{1}.part)@Value(_sub{2}.part)

The replacement and concatenation can also be performed with a single string expression.

@Value(_sub{0}.part)@ReplaceString(@Value(_sub{1}.part),_," ")@Value(_sub{2}.part)
Awesome!  Learned something new today and valuable in the future. I used AttributeManager and the ReplaceRegex for a new attribute. Thanks @takashi. I have a better understanding of look ahead and look behind now.