Question

DBF attribute names with accents problem on FME Cloud.


Badge

Hello,

I have ESRI shapefile where one attribute name contains 'ž' character. When I load this locally on my computer, when I specify in which coding the DBF file is saved, it reads it fine and I can store it in postgis database.

However when I publish that workspace to FME Cloud and run it there, this character is not displayed, all I see is this character: ? 

When looking at translation logs - This one is when I have run the workspace from my computer:

2016-06-13 10:48:31|   1.6|  0.0|INFORM|DBF Reader: Detected character encoding `cp1250'
2016-06-13 10:48:31|   1.6|  0.0|INFORM|DBF File 'C:\Users\Radek\Desktop\Prace\FME\kvarnerCAD-rijeka\09062016_shift_pokusy\CRPNA_STANICA_1250.dbf' has fields: ID number(10,0), Ime char(50), Ulica_loka char(50), NUS char(50), Izvor_poda char(50), Napomena char(100), Fotografij char(150), Uporabna_d char(250), Godina_izg number(5,0), Vrsta_crpk char(250), Broj_crpki char(50), Kapacitet_ char(50), Snaga_moto char(50), Instaliran char(50), Angažirana char(50), Broj_okret char(50), Godina_crp char(50), Sito char(1), Otvor_sita char(50)
2016-06-13 10:48:31|   1.6|  0.0|INFORM|DBF Reader: Using user specified character encoding `cp1250'

And this one is when I run the workspace from FME cloud, the log I see when I look it up on my instance

2016-06-13 08:19:41|   0.2|  0.0|INFORM|DBF Reader: Detected character encoding `cp1250'
2016-06-13 08:19:41|   0.2|  0.0|INFORM|DBF File '/data/fmeserver/resources/system/temp/upload/import_test/unip_import9.fmw/5B53176DDAEE316E75A8D92DF9890C32/CRPNA_STANICA_1250.dbf' has fields: ID number(10,0), Ime char(50), Ulica_loka char(50), NUS char(50), Izvor_poda char(50), Napomena char(100), Fotografij char(150), Uporabna_d char(250), Godina_izg number(5,0), Vrsta_crpk char(250), Broj_crpki char(50), Kapacitet_ char(50), Snaga_moto char(50), Instaliran char(50), Anga?irana char(50), Broj_okret char(50), Godina_crp char(50), Sito char(1), Otvor_sita char(50)
2016-06-13 08:19:41|   0.2|  0.0|INFORM|DBF Reader: Using user specified character encoding `cp1250'

However when I download that log from cloud, I can see in the log ? as 'ž' character.

My first question is: How to solve this?

My second question is: Is there any way to 'unaccent' attribute names so that I remove these accents - for example Angažirana becomes Angazirana? I have found here on these forums how to unaccent attribute values, which again 1) works fine when run locally from my laptop (on windows), but I need to change attribute names, not values. My python knowledge is very basic.

 

 

 

Any help is more than welcome.

Thanks, 

Radek


4 replies

Badge +5

Hi Radek @drakez

The data is obviously code page WIN1250, can I assume your local machine also has that as its system locale? Also, is the PostGIS database set up with the WIN1250 character set?

FME Cloud uses a UTF8 locale, so FME should read the data as WIN1250, convert it to UTF8, and write it to PostGIS where (I hope) it would get converted back to WIN1250.

That last part is what I'm unsure about, so I am going to check with our developers and get back to you. But right now I suspect either:

1) We aren't converting back to WIN1250 when we write the data, or...

2) The data is correct, but the log from FME Cloud is not displaying correctly

I'll let you know what I find.

Regards

Mark

Badge +5

OK. I consulted with a developer and the likely problem is this: you are authoring on a machine with a specific locale, and those attribute names are stored inside your workspace file using that locale. FME Cloud uses UTF8 and so doesn't understand those attribute names.

It's a known problem and we are working on improving it - but it sounds like it is a big job to reconfigure FME workspaces to be UTF8 compatible so a solution on our end is not going to be soon.

There are some workarounds. Firstly, Linux and Mac both use UTF8 by default, so any workspace authored on one of those platforms would be OK. I thought you might be able to use a UTF8 Windows locale, but our developer said no, you can't do that (maybe I misunderstood, because that doesn't seem logical, since FME Cloud uses that).

Another solution - depending on what your workspace does - is to use a dynamic workspace. A dynamic translation doesn't store the source attribute names in the workspace file, therefore it would not have the same issue.

Your other option is to change the characters as you suggested. I don't really know how to do that, but there are folk at Safe who might be able to help. If you want to try that, file a support case with us (safe.com/support) and ask for Lena's help!

I hope this helps to explain, and maybe help you with a workaround. I'm going to file a problem report with the developers too, just to make sure this particular scenario is documented and covered by the work they are planning to do.

Mark

Badge

OK. I consulted with a developer and the likely problem is this: you are authoring on a machine with a specific locale, and those attribute names are stored inside your workspace file using that locale. FME Cloud uses UTF8 and so doesn't understand those attribute names.

It's a known problem and we are working on improving it - but it sounds like it is a big job to reconfigure FME workspaces to be UTF8 compatible so a solution on our end is not going to be soon.

There are some workarounds. Firstly, Linux and Mac both use UTF8 by default, so any workspace authored on one of those platforms would be OK. I thought you might be able to use a UTF8 Windows locale, but our developer said no, you can't do that (maybe I misunderstood, because that doesn't seem logical, since FME Cloud uses that).

Another solution - depending on what your workspace does - is to use a dynamic workspace. A dynamic translation doesn't store the source attribute names in the workspace file, therefore it would not have the same issue.

Your other option is to change the characters as you suggested. I don't really know how to do that, but there are folk at Safe who might be able to help. If you want to try that, file a support case with us (safe.com/support) and ask for Lena's help!

I hope this helps to explain, and maybe help you with a workaround. I'm going to file a problem report with the developers too, just to make sure this particular scenario is documented and covered by the work they are planning to do.

Mark

Hello,

sorry for late response (Czech republic here, so I was sleeping when you have responded) and thank you @mark2catsafe for the clarification of the issue.

Yes, my computer uses central european localisation (windows-1250). So the problem might have arisen because I am editing my workspace on windows computer? If I should completely redo my workspace on linux machine, it would work on the cloud instance?

Regarding the dynamic translation - that is not an option unfortunately.

Our postgis database uses UTF-8 encoding, of that I am sure, and I see that bad character (?) when I write to the database from cloud instance with input data with accented characters in attribute names - this character breaks some of the triggers that are created in the database.

I will try and contact your support on wednesday, just to see if there is a solution to this.

Thanks

Badge +5

Hello,

sorry for late response (Czech republic here, so I was sleeping when you have responded) and thank you @mark2catsafe for the clarification of the issue.

Yes, my computer uses central european localisation (windows-1250). So the problem might have arisen because I am editing my workspace on windows computer? If I should completely redo my workspace on linux machine, it would work on the cloud instance?

Regarding the dynamic translation - that is not an option unfortunately.

Our postgis database uses UTF-8 encoding, of that I am sure, and I see that bad character (?) when I write to the database from cloud instance with input data with accented characters in attribute names - this character breaks some of the triggers that are created in the database.

I will try and contact your support on wednesday, just to see if there is a solution to this.

Thanks

Yes, it's not a very good workaround to have to suggest, but if you redo your workspace on a linux machine it should (I'm told) work OK. But please do contact our support team, as they will be able to go into this in more detail and try find you a solution.

Reply