Question

Read Microsoft Word file

8 years ago
March 13, 2017
5 replies
466 views

linhgg2
1 reply

I have a bunch of Microsoft Word file i want to count word in and the use the most common word with geographic content to generate a polygon where the word file is about.

My problem is that i cant read a word file in FME. I have created all the workspace to do what i want but the problem is that i have like 2 000 word file and i dont want to convert all of the into txt.

Do anyone have a solution to this?

+45

danilo_fme
Evangelist
2060 replies
8 years ago
March 13, 2017

Hi @linhgg2, are you try to read .doc files?

I have installed FMe Desktop 2017 and i didnt see a Reader for Microsoft Word.

+19

erik_jan
Contributor
2181 replies
8 years ago
March 13, 2017

No MS Word Reader available in FME 2017 yet.

So, I do not see any other option than converting to Text.

david_r
8356 replies
8 years ago
March 13, 2017

If your document is a .docx type file, it is actually a zip file containing several XML files etc. that you can read with FME. Here's what it might look like when opened in 7zip:

But I agree that unless you feel adventurous, it is probably easier to convert it to text first.

tim_wood
Contributor
311 replies
7 years ago
November 1, 2017

I have the same problem with the Ordnance Survey Local Custodians table:

https://www.ordnancesurvey.co.uk/docs/product-schemas/addressbase-products-local-custodian-codes.zip

I tried using the XML Reader but it won't open the .docx.

Converting the file to text in Word seems to result in the loss of the table structure.

I found saving the Word doc as HTML worked quite well (although still a manual step). Once it is in that format, FME will read it using the HTML Table Reader, and even strips out the title and text above the table.

I did find that the column headings got treated as a data row. Maybe the headings are formatted as HTML TD tags rather than TH ones. A handy update to the Reader would be to have something similar to CSV where you can specify whether there's a header row. A workaround is to tell the HTML Table Reader to start at feature 2.

+15

gio
Contributor
2252 replies
7 years ago
November 1, 2017

I had soem data handed to me in Word not long ago..

I just stuffed it in a txt/csv file and proceedde from there.

If it is formatted somehow in word i basicaly used variablesetters/and retrievers an a lot of regexp in stringsearchers etc.

Reply

Cookie policy

We use cookies to enhance and personalize your experience. If you accept you agree to our full cookie policy. Learn more about our cookies.

Cookie settings

We use 3 different kinds of cookies. You can choose which cookies you want to accept. We need basic cookies to make this site work, therefore these are the minimum you can select. Learn more about our cookies.

Basic
Functional

Normal
Functional + analytics

Complete
Functional + analytics + social media + embedded videos + marketing

Read Microsoft Word file