Skip to main content
Question

[FME 2025] How write .docx files to html pages

  • July 1, 2026
  • 1 reply
  • 0 views

joy
Enthusiast
Forum|alt.badge.img+15

Hello everyone,

The funny thing is, I am trying to read .docx files which apparently I have never done until now in FME Form I am trying to convert several .docx files to html pages.

I tried reading the help page of the Word reader, but unfortunately I don't get much wiser. It says to use the Word reader in conjunction with the transformer MSWordStyler. I looked into the MSWordStyler, but I also don't get much wiser. When I just connect a word reader with a html writer, I don't get any output (which I kind of expected). However, I really have no clue how to get from .docx → html

If I import all docx files in FME form, I kind of get a mess of all word pages meshed together in the Data Inspector

 

1 reply

j.botterill
Influencer
Forum|alt.badge.img+58
  • Influencer
  • July 1, 2026

What the Word Reader actually does is read the document into FME's rich text model, preserving things like: Paragraphs, Headings, Lists, Tables, Formatting runs and Images (optionally)

The MSWordStyler is what converts FME's internal Word formatting into HTML/CSS styling attributes that the HTML writer can understand.The transformer generates HTML fragments from the Word formatting information but it probably does it poorly !

The trick would be to inspect all the Format attributes…and organise these into appropriate html_content parts. Even write out a CSV to start with prior to trying html writer

Another option is to use PythonCaller and a library that has library smarts built in. e.g. mammoth


import mammoth

with open(docx_path, "rb") as docx_file:
result = mammoth.convert_to_html(docx_file)

html = result.value