Question

Is there a simple way to read Microsoft .docx and .doc files?

  • 15 February 2021
  • 1 reply
  • 22 views

The most recent info I have been able to find is from 2017, do later builds have a solution? The methods I have seen so far:

  • Call 7 ZIP and extract the archive
  • Convert to html
  • use a pythoncaller

Pythoncaller is probably the simplest solution, but I would like to try and achieve this using native FME if I can, as this is a training exercise in the software.

 

https://community.safe.com/s/question/0D54Q000080hDL7SAM/read-microsoft-word-file


1 reply

Userlevel 5
Badge +29

I'd say your best bet at this stage is to use a python caller. There seem to be a few libraries that can achieve this, specifically this one which also appears to be able to extract images and tables

 

http://theautomatic.net/2019/10/14/how-to-read-word-documents-with-python/

Reply