Skip to main content
Question

Read Microsoft Word file


I have a bunch of Microsoft Word file i want to count word in and the use the most common word with geographic content to generate a polygon where the word file is about.

My problem is that i cant read a word file in FME. I have created all the workspace to do what i want but the problem is that i have like 2 000 word file and i dont want to convert all of the into txt.

Do anyone have a solution to this?

5 replies

danilo_fme
Evangelist
Forum|alt.badge.img+42
  • Evangelist
  • March 13, 2017

Hi @linhgg2, are you try to read .doc files?

I have installed FMe Desktop 2017 and i didnt see a Reader for Microsoft Word.


erik_jan
Contributor
Forum|alt.badge.img+17
  • Contributor
  • March 13, 2017

No MS Word Reader available in FME 2017 yet.

So, I do not see any other option than converting to Text.


david_r
Evangelist
  • March 13, 2017

If your document is a .docx type file, it is actually a zip file containing several XML files etc. that you can read with FME. Here's what it might look like when opened in 7zip:

But I agree that unless you feel adventurous, it is probably easier to convert it to text first.


tim_wood
Contributor
Forum|alt.badge.img+8
  • Contributor
  • November 1, 2017

I have the same problem with the Ordnance Survey Local Custodians table:

https://www.ordnancesurvey.co.uk/docs/product-schemas/addressbase-products-local-custodian-codes.zip

I tried using the XML Reader but it won't open the .docx.

Converting the file to text in Word seems to result in the loss of the table structure.

I found saving the Word doc as HTML worked quite well (although still a manual step). Once it is in that format, FME will read it using the HTML Table Reader, and even strips out the title and text above the table.

I did find that the column headings got treated as a data row. Maybe the headings are formatted as HTML TD tags rather than TH ones. A handy update to the Reader would be to have something similar to CSV where you can specify whether there's a header row. A workaround is to tell the HTML Table Reader to start at feature 2.


gio
Contributor
Forum|alt.badge.img+15
  • Contributor
  • November 1, 2017

I had soem data handed to me in Word not long ago..

I just stuffed it in a txt/csv file and proceedde from there.

If it is formatted somehow in word i basicaly used variablesetters/and retrievers an a lot of regexp in stringsearchers etc.


Cookie policy

We use cookies to enhance and personalize your experience. If you accept you agree to our full cookie policy. Learn more about our cookies.

 
Cookie settings