Skip to main content
Solved

I have a map filled with html files encoded in windows-1252. I need to change all files encoding to utf-8. Is that possible in FME?

  • November 3, 2021
  • 9 replies
  • 209 views

lily
Participant
Forum|alt.badge.img+3
  • Participant

I think it takes a lot of time to convert one by one.

Best answer by daveatsafe

lily wrote:

Hi @daveatsafe​ ,

Thank you for your solution!

I have tried it and it works with one file at a time.

Then I tried using Zip instead since I wish to get all files done with the encoding workspace. But I ended up with a big html (instead of several html files which is suppose to be the same number of files in the original).

So I tried batch processing with reader "Directory and File Pathnames",

But now facing the problem that destination folder option is not available. Instead it writes everything to a single file too.

Any tips?

Hi @lily​,

You can use the Dataset Fanout to distinguish the output files:

  • Open the input Text file feature type properties, pick the Format Attribute tab, then click the box beside fme_basename, if it not already clicked.
  • In the Navigator pane of Workbench, expand the parameters for the Text File writer, then double click on Fanout Dataset.
  • Set the Destination Fanout Directory to the output zip file (zip files are considered folders by FME)
  • Set the Fanout Expression to '@Value(fme_basename).html'

This should write each input file to a separate output file in the output zip file.

View original
Did this help you find an answer to your question?

9 replies

daveatsafe
Safer
Forum|alt.badge.img+19
  • Safer
  • November 3, 2021

Hi @lily​,

 

Yes, it's possible with a simple Text File to Text File conversion. Set the encoding on the Text File reader to Latin-1 (windows-1252) and the encoding on the Text File writer to Unicode 8-bit (utf-8). This will create an output file identical to the input, except with the different encoding.

 

However, if you have any tags within the HTML identifying the encoding, these will need to be changed as well. You can do this by adding a StringReplacer to the workspace to replace the string 'iso-8859-1' with 'UTF-8'.


mark2atsafe
Safer
Forum|alt.badge.img+44
  • Safer
  • November 5, 2021

@daveatsafe​ has the correct solution here - but to illustrate it I made this one of my question-of-the-week and added it to a video here: https://youtu.be/uyF7MEuBdK0


lily
Participant
Forum|alt.badge.img+3
  • Author
  • Participant
  • November 5, 2021

Thank you Dave and Mark! I will try Dave's solution and give a reply as soon as I can!


lily
Participant
Forum|alt.badge.img+3
  • Author
  • Participant
  • November 7, 2021
daveatsafe wrote:

Hi @lily​,

 

Yes, it's possible with a simple Text File to Text File conversion. Set the encoding on the Text File reader to Latin-1 (windows-1252) and the encoding on the Text File writer to Unicode 8-bit (utf-8). This will create an output file identical to the input, except with the different encoding.

 

However, if you have any tags within the HTML identifying the encoding, these will need to be changed as well. You can do this by adding a StringReplacer to the workspace to replace the string 'iso-8859-1' with 'UTF-8'.

Hi @daveatsafe​ ,

Thank you for your solution!

I have tried it and it works with one file at a time.

Then I tried using Zip instead since I wish to get all files done with the encoding workspace. But I ended up with a big html (instead of several html files which is suppose to be the same number of files in the original).

So I tried batch processing with reader "Directory and File Pathnames",

But now facing the problem that destination folder option is not available. Instead it writes everything to a single file too.

Any tips?


lily
Participant
Forum|alt.badge.img+3
  • Author
  • Participant
  • November 7, 2021
mark2atsafe wrote:

@daveatsafe​ has the correct solution here - but to illustrate it I made this one of my question-of-the-week and added it to a video here: https://youtu.be/uyF7MEuBdK0

Thank you @mark2atsafe​ ! I have seen your youtube video and it helps a lot! =)


daveatsafe
Safer
Forum|alt.badge.img+19
  • Safer
  • Best Answer
  • November 8, 2021
lily wrote:

Hi @daveatsafe​ ,

Thank you for your solution!

I have tried it and it works with one file at a time.

Then I tried using Zip instead since I wish to get all files done with the encoding workspace. But I ended up with a big html (instead of several html files which is suppose to be the same number of files in the original).

So I tried batch processing with reader "Directory and File Pathnames",

But now facing the problem that destination folder option is not available. Instead it writes everything to a single file too.

Any tips?

Hi @lily​,

You can use the Dataset Fanout to distinguish the output files:

  • Open the input Text file feature type properties, pick the Format Attribute tab, then click the box beside fme_basename, if it not already clicked.
  • In the Navigator pane of Workbench, expand the parameters for the Text File writer, then double click on Fanout Dataset.
  • Set the Destination Fanout Directory to the output zip file (zip files are considered folders by FME)
  • Set the Fanout Expression to '@Value(fme_basename).html'

This should write each input file to a separate output file in the output zip file.


lily
Participant
Forum|alt.badge.img+3
  • Author
  • Participant
  • November 9, 2021
lily wrote:

Hi @daveatsafe​ ,

Thank you for your solution!

I have tried it and it works with one file at a time.

Then I tried using Zip instead since I wish to get all files done with the encoding workspace. But I ended up with a big html (instead of several html files which is suppose to be the same number of files in the original).

So I tried batch processing with reader "Directory and File Pathnames",

But now facing the problem that destination folder option is not available. Instead it writes everything to a single file too.

Any tips?

Thank you @daveatsafe​ !

I will give a feedback as soon as I can! BeSafe =)


lily
Participant
Forum|alt.badge.img+3
  • Author
  • Participant
  • November 12, 2021
lily wrote:

Hi @daveatsafe​ ,

Thank you for your solution!

I have tried it and it works with one file at a time.

Then I tried using Zip instead since I wish to get all files done with the encoding workspace. But I ended up with a big html (instead of several html files which is suppose to be the same number of files in the original).

So I tried batch processing with reader "Directory and File Pathnames",

But now facing the problem that destination folder option is not available. Instead it writes everything to a single file too.

Any tips?

It works perfectly!! Now I can move on to my next assignment =)


lily
Participant
Forum|alt.badge.img+3
  • Author
  • Participant
  • November 12, 2021
lily wrote:

Hi @daveatsafe​ ,

Thank you for your solution!

I have tried it and it works with one file at a time.

Then I tried using Zip instead since I wish to get all files done with the encoding workspace. But I ended up with a big html (instead of several html files which is suppose to be the same number of files in the original).

So I tried batch processing with reader "Directory and File Pathnames",

But now facing the problem that destination folder option is not available. Instead it writes everything to a single file too.

Any tips?

Thank you!! @daveatsafe​ @mark2atsafe​ 


Reply


Cookie policy

We use cookies to enhance and personalize your experience. If you accept you agree to our full cookie policy. Learn more about our cookies.

 
Cookie settings