Skip to main content
Solved

Remove multiple underscores that separates words


Forum|alt.badge.img+5

I'd like my regex to replace the middle underscores with a space. I cannot seem to escape the underscore character to use with curly brackets for repetitions.

I have strings with multiple underscores and spaces as word separators. I'd like my result to look like below:

 

word1_worda wordb wordc wordd_word3

Sample strings look like:

 

sample1: Autobooster 1_Autobooster Unit_Unit_Autobooster Unit

 

sample2:Anchor_FIELD_VERIFIED_DATE_NonDisplay

With my format these samples should result to:

 

sample1: Autobooster 1_Autobooster Unit Unit_Autobooster Unit

 

sample2:Anchor_FIELD VERIFIED DATE_NonDisplay

I used Lookahead and lookbehind and what I'm getting is the middle string with the underscores between word 1 and word 2 and word 2 and word 3 before and after the result. Regex below

\\w(?<=_).*_?\\w(?<=_)

Test string looks like below

If I were to finish my translation I have to use StringReplacer, StringConcatenator, and then merge them back again.

Best answer by courtney_m

I was able to do this with 2 StringSearchers and an AttributeManager:

The first StringSearcher finds the first word and underscore by searching the text for the Regex ^[^_]*_ and saving it as _first_word:

The second string searched finds the last word and underscore by searching the text for the RegEx _[^_]*$ and saving it as _last_word:

Then, in the attribute manager, I created the _middle_text attribute by trimming _first_word off the left of the text, trimming _last_work off the right of the text, then replacing _ with a space. I used the following notation:

@ReplaceString(@TrimRight(@TrimLeft(@Value(text_line_data),@Value(_first_word)),@Value(_last_word)),"_"," ")

Then, I created the final_text attribute by concatenating the attributes _first_word, _middle_text, and _last_word. Finally, I removed the un-needed attributes.

From inspector, you can see what the value of final_text is....

I have also attached the workspace, if you want it. I hope this helps!

 

-Courtney

View original
Did this help you find an answer to your question?

10 replies

ebygomm
Influencer
Forum|alt.badge.img+32
  • Influencer
  • August 18, 2017

So you want all but the first and last underscores replaced with spaces?


salvaleonrp
Enthusiast
Forum|alt.badge.img+15
  • Enthusiast
  • August 18, 2017
ebygomm wrote:

So you want all but the first and last underscores replaced with spaces?

Yes, that's the ideal.

 


courtney_m
Contributor
Forum|alt.badge.img+5
  • Contributor
  • Best Answer
  • August 18, 2017

I was able to do this with 2 StringSearchers and an AttributeManager:

The first StringSearcher finds the first word and underscore by searching the text for the Regex ^[^_]*_ and saving it as _first_word:

The second string searched finds the last word and underscore by searching the text for the RegEx _[^_]*$ and saving it as _last_word:

Then, in the attribute manager, I created the _middle_text attribute by trimming _first_word off the left of the text, trimming _last_work off the right of the text, then replacing _ with a space. I used the following notation:

@ReplaceString(@TrimRight(@TrimLeft(@Value(text_line_data),@Value(_first_word)),@Value(_last_word)),"_"," ")

Then, I created the final_text attribute by concatenating the attributes _first_word, _middle_text, and _last_word. Finally, I removed the un-needed attributes.

From inspector, you can see what the value of final_text is....

I have also attached the workspace, if you want it. I hope this helps!

 

-Courtney


salvaleonrp
Enthusiast
Forum|alt.badge.img+15
  • Enthusiast
  • August 18, 2017
courtney_m wrote:

I was able to do this with 2 StringSearchers and an AttributeManager:

The first StringSearcher finds the first word and underscore by searching the text for the Regex ^[^_]*_ and saving it as _first_word:

The second string searched finds the last word and underscore by searching the text for the RegEx _[^_]*$ and saving it as _last_word:

Then, in the attribute manager, I created the _middle_text attribute by trimming _first_word off the left of the text, trimming _last_work off the right of the text, then replacing _ with a space. I used the following notation:

@ReplaceString(@TrimRight(@TrimLeft(@Value(text_line_data),@Value(_first_word)),@Value(_last_word)),"_"," ")

Then, I created the final_text attribute by concatenating the attributes _first_word, _middle_text, and _last_word. Finally, I removed the un-needed attributes.

From inspector, you can see what the value of final_text is....

I have also attached the workspace, if you want it. I hope this helps!

 

-Courtney

Thanks @courtney_m for explaining and providing your workspace. I appreciate that.

 


danilo_fme
Evangelist
Forum|alt.badge.img+42
  • Evangelist
  • August 18, 2017
courtney_m wrote:

I was able to do this with 2 StringSearchers and an AttributeManager:

The first StringSearcher finds the first word and underscore by searching the text for the Regex ^[^_]*_ and saving it as _first_word:

The second string searched finds the last word and underscore by searching the text for the RegEx _[^_]*$ and saving it as _last_word:

Then, in the attribute manager, I created the _middle_text attribute by trimming _first_word off the left of the text, trimming _last_work off the right of the text, then replacing _ with a space. I used the following notation:

@ReplaceString(@TrimRight(@TrimLeft(@Value(text_line_data),@Value(_first_word)),@Value(_last_word)),"_"," ")

Then, I created the final_text attribute by concatenating the attributes _first_word, _middle_text, and _last_word. Finally, I removed the un-needed attributes.

From inspector, you can see what the value of final_text is....

I have also attached the workspace, if you want it. I hope this helps!

 

-Courtney

That is a powerful article. Great! @courtney_m

 


courtney_m
Contributor
Forum|alt.badge.img+5
  • Contributor
  • August 18, 2017
danilo_fme wrote:
That is a powerful article. Great! @courtney_m

 

Thank you, @danilo_inovacao!

courtney_m
Contributor
Forum|alt.badge.img+5
  • Contributor
  • August 18, 2017
salvaleonrp wrote:
Thanks @courtney_m for explaining and providing your workspace. I appreciate that.

 

You're very welcome, @salvaleonrp. I'm glad I could help.

 


salvaleonrp
Enthusiast
Forum|alt.badge.img+15
  • Enthusiast
  • August 18, 2017

The accepted answer is good enough but I wonder if there's a single regex string to remove the extra underscores. Any takers?


takashi
Supporter
  • August 19, 2017

@salvaleonrp, accept your challenge: "single regex string to remove the extra underscores"

[2017-08-20: Update] Simplified the regex.

Use a StringReplacer with these parameters.

  • Mode: Replace Regular Expression
  • Text To Replace: (?<=_)(.*?)_(?=.*_)
  • Replacement Text: \1<space>

This string expression set to a transformer parameter works as well. Assume a feature attribute called "text" contains the source text string.

@ReplaceRegEx(@Value(text),(?<=_)(.*?)_(?=.*_),\1 )

Another thought:

1. StringSearther: Split the source text into 3 parts.

  • Contains Regular Expression: ^(.*?_)(.*_.*)(_.*)$
  • Subexpression Matches List Name: _sub

2. StringReplacer: Replace every underscore in the middle part with space.

  • Attributes: _sub{1}.part
  • Mode: Replace Text
  • Text To Replace: _
  • Replacement Text: <space>

3. StringConcatenator etc.: Simply concatenate the three elements of "_sub{}.part" list.

@Value(_sub{0}.part)@Value(_sub{1}.part)@Value(_sub{2}.part)

The replacement and concatenation can also be performed with a single string expression.

@Value(_sub{0}.part)@ReplaceString(@Value(_sub{1}.part),_," ")@Value(_sub{2}.part)

salvaleonrp
Enthusiast
Forum|alt.badge.img+15
  • Enthusiast
  • August 21, 2017
takashi wrote:

@salvaleonrp, accept your challenge: "single regex string to remove the extra underscores"

[2017-08-20: Update] Simplified the regex.

Use a StringReplacer with these parameters.

  • Mode: Replace Regular Expression
  • Text To Replace: (?<=_)(.*?)_(?=.*_)
  • Replacement Text: \1<space>

This string expression set to a transformer parameter works as well. Assume a feature attribute called "text" contains the source text string.

@ReplaceRegEx(@Value(text),(?<=_)(.*?)_(?=.*_),\1 )

Another thought:

1. StringSearther: Split the source text into 3 parts.

  • Contains Regular Expression: ^(.*?_)(.*_.*)(_.*)$
  • Subexpression Matches List Name: _sub

2. StringReplacer: Replace every underscore in the middle part with space.

  • Attributes: _sub{1}.part
  • Mode: Replace Text
  • Text To Replace: _
  • Replacement Text: <space>

3. StringConcatenator etc.: Simply concatenate the three elements of "_sub{}.part" list.

@Value(_sub{0}.part)@Value(_sub{1}.part)@Value(_sub{2}.part)

The replacement and concatenation can also be performed with a single string expression.

@Value(_sub{0}.part)@ReplaceString(@Value(_sub{1}.part),_," ")@Value(_sub{2}.part)
Awesome!  Learned something new today and valuable in the future. I used AttributeManager and the ReplaceRegex for a new attribute. Thanks @takashi. I have a better understanding of look ahead and look behind now.

 


Cookie policy

We use cookies to enhance and personalize your experience. If you accept you agree to our full cookie policy. Learn more about our cookies.

 
Cookie settings