Skip to main content

I'd like my regex to replace the middle underscores with a space. I cannot seem to escape the underscore character to use with curly brackets for repetitions.

I have strings with multiple underscores and spaces as word separators. I'd like my result to look like below:

 

word1_worda wordb wordc wordd_word3

Sample strings look like:

 

sample1: Autobooster 1_Autobooster Unit_Unit_Autobooster Unit

 

sample2:Anchor_FIELD_VERIFIED_DATE_NonDisplay

With my format these samples should result to:

 

sample1: Autobooster 1_Autobooster Unit Unit_Autobooster Unit

 

sample2:Anchor_FIELD VERIFIED DATE_NonDisplay

I used Lookahead and lookbehind and what I'm getting is the middle string with the underscores between word 1 and word 2 and word 2 and word 3 before and after the result. Regex below

\\w(?<=_).*_?\\w(?<=_)

Test string looks like below

If I were to finish my translation I have to use StringReplacer, StringConcatenator, and then merge them back again.

So you want all but the first and last underscores replaced with spaces?


So you want all but the first and last underscores replaced with spaces?

Yes, that's the ideal.

 


I was able to do this with 2 StringSearchers and an AttributeManager:

The first StringSearcher finds the first word and underscore by searching the text for the Regex ^[^_]*_ and saving it as _first_word:

The second string searched finds the last word and underscore by searching the text for the RegEx _[^_]*$ and saving it as _last_word:

Then, in the attribute manager, I created the _middle_text attribute by trimming _first_word off the left of the text, trimming _last_work off the right of the text, then replacing _ with a space. I used the following notation:

@ReplaceString(@TrimRight(@TrimLeft(@Value(text_line_data),@Value(_first_word)),@Value(_last_word)),"_"," ")

Then, I created the final_text attribute by concatenating the attributes _first_word, _middle_text, and _last_word. Finally, I removed the un-needed attributes.

From inspector, you can see what the value of final_text is....

I have also attached the workspace, if you want it. I hope this helps!

 

-Courtney


I was able to do this with 2 StringSearchers and an AttributeManager:

The first StringSearcher finds the first word and underscore by searching the text for the Regex ^[^_]*_ and saving it as _first_word:

The second string searched finds the last word and underscore by searching the text for the RegEx _[^_]*$ and saving it as _last_word:

Then, in the attribute manager, I created the _middle_text attribute by trimming _first_word off the left of the text, trimming _last_work off the right of the text, then replacing _ with a space. I used the following notation:

@ReplaceString(@TrimRight(@TrimLeft(@Value(text_line_data),@Value(_first_word)),@Value(_last_word)),"_"," ")

Then, I created the final_text attribute by concatenating the attributes _first_word, _middle_text, and _last_word. Finally, I removed the un-needed attributes.

From inspector, you can see what the value of final_text is....

I have also attached the workspace, if you want it. I hope this helps!

 

-Courtney

Thanks @courtney_m for explaining and providing your workspace. I appreciate that.

 


I was able to do this with 2 StringSearchers and an AttributeManager:

The first StringSearcher finds the first word and underscore by searching the text for the Regex ^[^_]*_ and saving it as _first_word:

The second string searched finds the last word and underscore by searching the text for the RegEx _[^_]*$ and saving it as _last_word:

Then, in the attribute manager, I created the _middle_text attribute by trimming _first_word off the left of the text, trimming _last_work off the right of the text, then replacing _ with a space. I used the following notation:

@ReplaceString(@TrimRight(@TrimLeft(@Value(text_line_data),@Value(_first_word)),@Value(_last_word)),"_"," ")

Then, I created the final_text attribute by concatenating the attributes _first_word, _middle_text, and _last_word. Finally, I removed the un-needed attributes.

From inspector, you can see what the value of final_text is....

I have also attached the workspace, if you want it. I hope this helps!

 

-Courtney

That is a powerful article. Great! @courtney_m

 


That is a powerful article. Great! @courtney_m

 

Thank you, @danilo_inovacao!
Thanks @courtney_m for explaining and providing your workspace. I appreciate that.

 

You're very welcome, @salvaleonrp. I'm glad I could help.

 


The accepted answer is good enough but I wonder if there's a single regex string to remove the extra underscores. Any takers?


@salvaleonrp, accept your challenge: "single regex string to remove the extra underscores"

r2017-08-20: Update] Simplified the regex.

Use a StringReplacer with these parameters.

  • Mode: Replace Regular Expression
  • Text To Replace: (?<=_)(.*?)_(?=.*_)
  • Replacement Text: \1<space>

This string expression set to a transformer parameter works as well. Assume a feature attribute called "text" contains the source text string.

@ReplaceRegEx(@Value(text),(?<=_)(.*?)_(?=.*_),\1 )

Another thought:

1. StringSearther: Split the source text into 3 parts.

  • Contains Regular Expression: ^(.*?_)(.*_.*)(_.*)$
  • Subexpression Matches List Name: _sub

2. StringReplacer: Replace every underscore in the middle part with space.

  • Attributes: _sub{1}.part
  • Mode: Replace Text
  • Text To Replace: _
  • Replacement Text: <space>

3. StringConcatenator etc.: Simply concatenate the three elements of "_sub{}.part" list.

@Value(_sub{0}.part)@Value(_sub{1}.part)@Value(_sub{2}.part)

The replacement and concatenation can also be performed with a single string expression.

@Value(_sub{0}.part)@ReplaceString(@Value(_sub{1}.part),_," ")@Value(_sub{2}.part)

@salvaleonrp, accept your challenge: "single regex string to remove the extra underscores"

r2017-08-20: Update] Simplified the regex.

Use a StringReplacer with these parameters.

  • Mode: Replace Regular Expression
  • Text To Replace: (?<=_)(.*?)_(?=.*_)
  • Replacement Text: \1<space>

This string expression set to a transformer parameter works as well. Assume a feature attribute called "text" contains the source text string.

@ReplaceRegEx(@Value(text),(?<=_)(.*?)_(?=.*_),\1 )

Another thought:

1. StringSearther: Split the source text into 3 parts.

  • Contains Regular Expression: ^(.*?_)(.*_.*)(_.*)$
  • Subexpression Matches List Name: _sub

2. StringReplacer: Replace every underscore in the middle part with space.

  • Attributes: _sub{1}.part
  • Mode: Replace Text
  • Text To Replace: _
  • Replacement Text: <space>

3. StringConcatenator etc.: Simply concatenate the three elements of "_sub{}.part" list.

@Value(_sub{0}.part)@Value(_sub{1}.part)@Value(_sub{2}.part)

The replacement and concatenation can also be performed with a single string expression.

@Value(_sub{0}.part)@ReplaceString(@Value(_sub{1}.part),_," ")@Value(_sub{2}.part)
Awesome!  Learned something new today and valuable in the future. I used AttributeManager and the ReplaceRegex for a new attribute. Thanks @takashi. I have a better understanding of look ahead and look behind now.

 


Reply