Skip to main content
Solved

I have a map filled with html files encoded in windows-1252. I need to change all files encoding to utf-8. Is that possible in FME?

  • November 3, 2021
  • 9 replies
  • 269 views

lily
Contributor
Forum|alt.badge.img+13
  • Contributor
  • 16 replies

I think it takes a lot of time to convert one by one.

Best answer by daveatsafe

Hi @daveatsafe​ ,

Thank you for your solution!

I have tried it and it works with one file at a time.

Then I tried using Zip instead since I wish to get all files done with the encoding workspace. But I ended up with a big html (instead of several html files which is suppose to be the same number of files in the original).

So I tried batch processing with reader "Directory and File Pathnames",

But now facing the problem that destination folder option is not available. Instead it writes everything to a single file too.

Any tips?

Hi @lily​,

You can use the Dataset Fanout to distinguish the output files:

  • Open the input Text file feature type properties, pick the Format Attribute tab, then click the box beside fme_basename, if it not already clicked.
  • In the Navigator pane of Workbench, expand the parameters for the Text File writer, then double click on Fanout Dataset.
  • Set the Destination Fanout Directory to the output zip file (zip files are considered folders by FME)
  • Set the Fanout Expression to '@Value(fme_basename).html'

This should write each input file to a separate output file in the output zip file.

This post is closed to further activity.
It may be an old question, an answered question, an implemented idea, or a notification-only post.
Please check post dates before relying on any information in a question or answer.
For follow-up or related questions, please post a new question or idea.
If there is a genuine update to be made, please contact us and request that the post is reopened.

9 replies

daveatsafe
Safer
Forum|alt.badge.img+19
  • Safer
  • 1637 replies
  • November 3, 2021

Hi @lily​,

 

Yes, it's possible with a simple Text File to Text File conversion. Set the encoding on the Text File reader to Latin-1 (windows-1252) and the encoding on the Text File writer to Unicode 8-bit (utf-8). This will create an output file identical to the input, except with the different encoding.

 

However, if you have any tags within the HTML identifying the encoding, these will need to be changed as well. You can do this by adding a StringReplacer to the workspace to replace the string 'iso-8859-1' with 'UTF-8'.


mark2atsafe
Safer
Forum|alt.badge.img+56
  • Safer
  • 2554 replies
  • November 5, 2021

@daveatsafe​ has the correct solution here - but to illustrate it I made this one of my question-of-the-week and added it to a video here: https://youtu.be/uyF7MEuBdK0


lily
Contributor
Forum|alt.badge.img+13
  • Author
  • Contributor
  • 16 replies
  • November 5, 2021

Thank you Dave and Mark! I will try Dave's solution and give a reply as soon as I can!


lily
Contributor
Forum|alt.badge.img+13
  • Author
  • Contributor
  • 16 replies
  • November 7, 2021

Hi @lily​,

 

Yes, it's possible with a simple Text File to Text File conversion. Set the encoding on the Text File reader to Latin-1 (windows-1252) and the encoding on the Text File writer to Unicode 8-bit (utf-8). This will create an output file identical to the input, except with the different encoding.

 

However, if you have any tags within the HTML identifying the encoding, these will need to be changed as well. You can do this by adding a StringReplacer to the workspace to replace the string 'iso-8859-1' with 'UTF-8'.

Hi @daveatsafe​ ,

Thank you for your solution!

I have tried it and it works with one file at a time.

Then I tried using Zip instead since I wish to get all files done with the encoding workspace. But I ended up with a big html (instead of several html files which is suppose to be the same number of files in the original).

So I tried batch processing with reader "Directory and File Pathnames",

But now facing the problem that destination folder option is not available. Instead it writes everything to a single file too.

Any tips?


lily
Contributor
Forum|alt.badge.img+13
  • Author
  • Contributor
  • 16 replies
  • November 7, 2021

@daveatsafe​ has the correct solution here - but to illustrate it I made this one of my question-of-the-week and added it to a video here: https://youtu.be/uyF7MEuBdK0

Thank you @mark2atsafe​ ! I have seen your youtube video and it helps a lot! =)


daveatsafe
Safer
Forum|alt.badge.img+19
  • Safer
  • 1637 replies
  • Best Answer
  • November 8, 2021

Hi @daveatsafe​ ,

Thank you for your solution!

I have tried it and it works with one file at a time.

Then I tried using Zip instead since I wish to get all files done with the encoding workspace. But I ended up with a big html (instead of several html files which is suppose to be the same number of files in the original).

So I tried batch processing with reader "Directory and File Pathnames",

But now facing the problem that destination folder option is not available. Instead it writes everything to a single file too.

Any tips?

Hi @lily​,

You can use the Dataset Fanout to distinguish the output files:

  • Open the input Text file feature type properties, pick the Format Attribute tab, then click the box beside fme_basename, if it not already clicked.
  • In the Navigator pane of Workbench, expand the parameters for the Text File writer, then double click on Fanout Dataset.
  • Set the Destination Fanout Directory to the output zip file (zip files are considered folders by FME)
  • Set the Fanout Expression to '@Value(fme_basename).html'

This should write each input file to a separate output file in the output zip file.


lily
Contributor
Forum|alt.badge.img+13
  • Author
  • Contributor
  • 16 replies
  • November 9, 2021

Hi @daveatsafe​ ,

Thank you for your solution!

I have tried it and it works with one file at a time.

Then I tried using Zip instead since I wish to get all files done with the encoding workspace. But I ended up with a big html (instead of several html files which is suppose to be the same number of files in the original).

So I tried batch processing with reader "Directory and File Pathnames",

But now facing the problem that destination folder option is not available. Instead it writes everything to a single file too.

Any tips?

Thank you @daveatsafe​ !

I will give a feedback as soon as I can! BeSafe =)


lily
Contributor
Forum|alt.badge.img+13
  • Author
  • Contributor
  • 16 replies
  • November 12, 2021

Hi @daveatsafe​ ,

Thank you for your solution!

I have tried it and it works with one file at a time.

Then I tried using Zip instead since I wish to get all files done with the encoding workspace. But I ended up with a big html (instead of several html files which is suppose to be the same number of files in the original).

So I tried batch processing with reader "Directory and File Pathnames",

But now facing the problem that destination folder option is not available. Instead it writes everything to a single file too.

Any tips?

It works perfectly!! Now I can move on to my next assignment =)


lily
Contributor
Forum|alt.badge.img+13
  • Author
  • Contributor
  • 16 replies
  • November 12, 2021

Hi @daveatsafe​ ,

Thank you for your solution!

I have tried it and it works with one file at a time.

Then I tried using Zip instead since I wish to get all files done with the encoding workspace. But I ended up with a big html (instead of several html files which is suppose to be the same number of files in the original).

So I tried batch processing with reader "Directory and File Pathnames",

But now facing the problem that destination folder option is not available. Instead it writes everything to a single file too.

Any tips?

Thank you!! @daveatsafe​ @mark2atsafe​