Question

How do I find and remove funny characters?,How do I find and remove funny characters from a string?


Badge

Hi,

I have a csv data set that contains addresses and some of them contain funny characters e.g ^ and I am not sure how to find and remove them all easily? I've been using the Tester saying attribute contains ^ but this seems to be a really long process, especially with a large data set.

Also, some of the addresses contain brackets at the beginning e.g (house) 1 valley road, which I also want to remove. However, I want to keep the brackets at the end e.g Pyramids (swimming pool) as these contain useful information. The tester isn't working to narrow down how many contain brackets at the beginning and how to remove them?

Which transformer shall I use to find and remove funny characters and brackets at the beginning of text please?

Apologies if this doesn't make much sense, I am very new to FME.

Kind Regards,

Jess


14 replies

Userlevel 4
Badge +30

Hi @jessbeach,

I simulated your case with a transformer AttributeCreator: att = ^Canada

The transformer StringReplacer do what you want.

In Text to Replace you write your symbol: ^

Attached the template file ( FMWT )

Thanks,

Danilo - -workspace-replace.fmwt

Userlevel 4
Badge +13

Hi @jessbeach, the method @danilo_inovacao pointed out with the StringReplacer will help you take out those odd characters such as ^. For removing the first set of brackets, I recommend checking out @courtney_m's answer to Remove Multiple Underscores.

Badge +2

Just add to @danilo_inovacao answer which replaces any ^ symbol, but you mention funny characters so possibly you have additional characters. You can still use the StringReplacer but in Replace Regular Expression Mode where you can use Regex to list the characters. But be aware some Regex characters have special meaning including ^ so you would have to escape them.

Here is an example with some funny characters.

Userlevel 4
Badge +30

Just add to @danilo_inovacao answer which replaces any ^ symbol, but you mention funny characters so possibly you have additional characters. You can still use the StringReplacer but in Replace Regular Expression Mode where you can use Regex to list the characters. But be aware some Regex characters have special meaning including ^ so you would have to escape them.

Here is an example with some funny characters.

Thanks @mark_1spatial to add this information in this answer :)

 

 

Badge

Why would you want to remove the funny characters? If a character makes you laugh, it should be worth keeping - I say, remove the non-funny ones! ;-)

Userlevel 4
Badge +30

Why would you want to remove the funny characters? If a character makes you laugh, it should be worth keeping - I say, remove the non-funny ones! ;-)

 

uaahaaauhahaua :P
Badge

Hi @jessbeach

 

if you have a number of 'funny characters' to remove, you might want to use StringPairReplacer to replace all of them with... 'something' that can then be replaced with 'nothing' using StringReplacer.

In the attached example I am getting rid of #, ^, and ( in my string:

  • first I replace each not needed character with © which is not used in my data;
  • then I replace all © characters with nothing;
  • then... I get some not needed multiple spaces (removed characters were surrounded with spaces, we took the characters away leaving all the spaces in place) - we can fix this using regex.removefunnycharacters.fmw

Badge

@jessbeach

 

To remove anything in brackets (together with the brackets) at the beginning of the string, you can use

^[(].*[)] regex:

Badge +1

Jess,

A tool you can try is the StringCleaner Transformer available from the FME Hub.

Download directly from within FME thru the Transformer Panel - FME Hub - Strings - StringCleaner OR visit:

StringCleaner Transformer Link

..............

Regular Expressions - VERY useful within FME - these will help in with your particular question.

A good place to start learning / testing them is:

http://rubular.com/

 

..................

Hope this helps

Howard L'

 

Badge +2

Jess,

A tool you can try is the StringCleaner Transformer available from the FME Hub.

Download directly from within FME thru the Transformer Panel - FME Hub - Strings - StringCleaner OR visit:

StringCleaner Transformer Link

..............

Regular Expressions - VERY useful within FME - these will help in with your particular question.

A good place to start learning / testing them is:

http://rubular.com/

 

..................

Hope this helps

Howard L'

 

Great suggestion @howard_l ... and nice custom transformer @jeroenstiers !
Badge

@jessbeach

 

To remove anything in brackets (together with the brackets) at the beginning of the string, you can use

^[(].*[)] regex:

Hi Lena, thanks for getting back to me. Is there also a way to remove brackets from a middle of a string e.g Jessica (Jess) Beach please?

 

Badge

Thanks for all getting back to me. I will try your suggestions and get back to you if I have any more questions.

Userlevel 1
Badge +21

Sometimes it's easier to keep the non-funny characters than removing the specific characters you do not want

e.g. removing everything that's not a space or a word character in a stringreplacer with the following regular expression

[^\\s\\w]

Badge +16

Don't remove anything, use a geocoding solution that handles the junk :-)

Look at ExInfo returned by this call:

http://geocode.arcgis.com/arcgis/rest/services/World/GeocodeServer/findAddressCandidates?SingleLine=(House)+380+New+York+Street,+(Swimming%20pool)++Redlands,+CA+92373&category;=&outFields;=ExInfo&forStorage;=false&f;=pjson&maxLocations;=1

Reply