Skip to main content

Hi, Safers!

 

I read Text Files as Whole text at once but I need convert files not in UTF-8 to UTF-8 before reading. This is because FME needs to know the encoding when it reads the file. 

 

I figure one easy way to do this is to store the files to a temp folder and via the SystemCaller run some windows command to convert the files not already in UTF-8 (this can be more than thousands of files so preferably I can convert only the files that needs converting) before reading them with the Text File reader with Character Encoding set to UTF-8.

 

Does anyone have any suggestions on how to solve this in an easy way? Preferably within the FME script if possible...

 

Kind regards, Peter

Have you tried using the AttributeEncoder? It can convert between different encodings, including utf-8.


Do you already know what encoding the files have which are not UTF-8?


Have you tried using the AttributeEncoder? It can convert between different encodings, including utf-8.

I have to set the encoding when reading the file so this problem has to be solved before reading the file, no?


Do you already know what encoding the files have which are not UTF-8?

No, I have to check for each file.


No, I have to check for each file.

So how are you determining the encoding you need to read from? Most programs that show you an encoding of a text file are a best guess not an absolute.


No, I have to check for each file.

Ok, I didn't know that. I open the file in Notepad++ now and check the encoding there. When it comes to this problem I wish I lived in an English speaking country and not Sweden...


I have to set the encoding when reading the file so this problem has to be solved before reading the file, no?

The reader encoding only tells FME what to expect so that it can correctly encode the attributes for you, but you can override it in the AttributeEncoder. Of course, if you know the input encoding then that will simplify the tast quite a bit.

Worst case, there is also the "Data File" reader which lets you read the binary contents of the file.


No, I have to check for each file.

No BOM in the utf-8 files? It might be worth checking, as it would make it easier for you.


No, I have to check for each file.

Encoding issues are the bane of my life at the moment, living in the UK isn't helping!

 

 


No, I have to check for each file.

The files in my case can be created in any text editor or other programs so I can never know what to expect. Somehow I need to check.

 


No, I have to check for each file.

Haha, really! Yeah, there are probably many of us who would like to see a well thought out handling of this problem within FME.


No, I have to check for each file.

If you can use Python, this may be a viable option, which could potentially save you a lot of guesswork: https://pypi.org/project/chardet/

You will notice that even this library can only detect the encoding with a certain confidence.


No, I have to check for each file.

Thanks, @david_r​ , I was hoping for a simpler answer, but don't we always? I'll look into your proposed Python-solution!


Reply