So you want all but the first and last underscores replaced with spaces?
I was able to do this with 2 StringSearchers and an AttributeManager:
The first StringSearcher finds the first word and underscore by searching the text for the Regex ^[^_]*_ and saving it as _first_word:
The second string searched finds the last word and underscore by searching the text for the RegEx _[^_]*$ and saving it as _last_word:
Then, in the attribute manager, I created the _middle_text attribute by trimming _first_word off the left of the text, trimming _last_work off the right of the text, then replacing _ with a space. I used the following notation:
@ReplaceString(@TrimRight(@TrimLeft(@Value(text_line_data),@Value(_first_word)),@Value(_last_word)),"_"," ")
Then, I created the final_text attribute by concatenating the attributes _first_word, _middle_text, and _last_word. Finally, I removed the un-needed attributes.
From inspector, you can see what the value of final_text is....
I have also attached the workspace, if you want it. I hope this helps!
-Courtney
I was able to do this with 2 StringSearchers and an AttributeManager:
The first StringSearcher finds the first word and underscore by searching the text for the Regex ^[^_]*_ and saving it as _first_word:
The second string searched finds the last word and underscore by searching the text for the RegEx _[^_]*$ and saving it as _last_word:
Then, in the attribute manager, I created the _middle_text attribute by trimming _first_word off the left of the text, trimming _last_work off the right of the text, then replacing _ with a space. I used the following notation:
@ReplaceString(@TrimRight(@TrimLeft(@Value(text_line_data),@Value(_first_word)),@Value(_last_word)),"_"," ")
Then, I created the final_text attribute by concatenating the attributes _first_word, _middle_text, and _last_word. Finally, I removed the un-needed attributes.
From inspector, you can see what the value of final_text is....
I have also attached the workspace, if you want it. I hope this helps!
-Courtney
Thanks @courtney_m for explaining and providing your workspace. I appreciate that.
I was able to do this with 2 StringSearchers and an AttributeManager:
The first StringSearcher finds the first word and underscore by searching the text for the Regex ^[^_]*_ and saving it as _first_word:
The second string searched finds the last word and underscore by searching the text for the RegEx _[^_]*$ and saving it as _last_word:
Then, in the attribute manager, I created the _middle_text attribute by trimming _first_word off the left of the text, trimming _last_work off the right of the text, then replacing _ with a space. I used the following notation:
@ReplaceString(@TrimRight(@TrimLeft(@Value(text_line_data),@Value(_first_word)),@Value(_last_word)),"_"," ")
Then, I created the final_text attribute by concatenating the attributes _first_word, _middle_text, and _last_word. Finally, I removed the un-needed attributes.
From inspector, you can see what the value of final_text is....
I have also attached the workspace, if you want it. I hope this helps!
-Courtney
That is a powerful article. Great! @courtney_m
That is a powerful article. Great! @courtney_m
Thank you, @danilo_inovacao!
Thanks @courtney_m for explaining and providing your workspace. I appreciate that.
You're very welcome, @salvaleonrp. I'm glad I could help.
The accepted answer is good enough but I wonder if there's a single regex string to remove the extra underscores. Any takers?
@salvaleonrp, accept your challenge: "single regex string to remove the extra underscores"
r2017-08-20: Update] Simplified the regex.
Use a StringReplacer with these parameters.
- Mode: Replace Regular Expression
- Text To Replace: (?<=_)(.*?)_(?=.*_)
- Replacement Text: \1<space>
This string expression set to a transformer parameter works as well. Assume a feature attribute called "text" contains the source text string.
@ReplaceRegEx(@Value(text),(?<=_)(.*?)_(?=.*_),\1 )
Another thought:
1. StringSearther: Split the source text into 3 parts.
- Contains Regular Expression: ^(.*?_)(.*_.*)(_.*)$
- Subexpression Matches List Name: _sub
2. StringReplacer: Replace every underscore in the middle part with space.
- Attributes: _sub{1}.part
- Mode: Replace Text
- Text To Replace: _
- Replacement Text: <space>
3. StringConcatenator etc.: Simply concatenate the three elements of "_sub{}.part" list.
@Value(_sub{0}.part)@Value(_sub{1}.part)@Value(_sub{2}.part)
The replacement and concatenation can also be performed with a single string expression.
@Value(_sub{0}.part)@ReplaceString(@Value(_sub{1}.part),_," ")@Value(_sub{2}.part)
@salvaleonrp, accept your challenge: "single regex string to remove the extra underscores"
r2017-08-20: Update] Simplified the regex.
Use a StringReplacer with these parameters.
- Mode: Replace Regular Expression
- Text To Replace: (?<=_)(.*?)_(?=.*_)
- Replacement Text: \1<space>
This string expression set to a transformer parameter works as well. Assume a feature attribute called "text" contains the source text string.
@ReplaceRegEx(@Value(text),(?<=_)(.*?)_(?=.*_),\1 )
Another thought:
1. StringSearther: Split the source text into 3 parts.
- Contains Regular Expression: ^(.*?_)(.*_.*)(_.*)$
- Subexpression Matches List Name: _sub
2. StringReplacer: Replace every underscore in the middle part with space.
- Attributes: _sub{1}.part
- Mode: Replace Text
- Text To Replace: _
- Replacement Text: <space>
3. StringConcatenator etc.: Simply concatenate the three elements of "_sub{}.part" list.
@Value(_sub{0}.part)@Value(_sub{1}.part)@Value(_sub{2}.part)
The replacement and concatenation can also be performed with a single string expression.
@Value(_sub{0}.part)@ReplaceString(@Value(_sub{1}.part),_," ")@Value(_sub{2}.part)
Awesome! Learned something new today and valuable in the future. I used AttributeManager and the ReplaceRegex for a new attribute. Thanks @takashi. I have a better understanding of look ahead and look behind now.