Skip to main content

I’m looking to read in a load of word documents that contain tables of information. I’d like to extract all the text from those tables so that I can then search the text for specific information. I’ve tried using the Microsoft Word Reader and the MSWordStyler but I’m only seeing the paragraph text come through to the attributes. Any ideas how to extract the table text? 

 

Hi ​@hannahwh05 

 

What is the version of your FME Form?

 

Thanks in Advance,


Hi ​@hannahwh05 

 

What is the version of your FME Form?

 

Thanks in Advance,

Hi ​@danilo_fme 2024.2.4.0


Hi ​@hannahwh05 

 

What is the version of your FME Form?

 

Thanks in Advance,

Hi ​@danilo_fme 2024.2.4.0

Thank you! Could you share an example?


I’m using 2025.1, but I suspect it’s the same in 2024.2. I created the following Word Document:

I used the MS Word Reader and the content appears as unexposed attributes. The relevant attributes for the tables are lists called msword_table_col_header{} and msword_table_col_value{}. Apparently, it doesn’t matter if the table header is defined as a header in Word or not. In FME, the top row of the table is the header.

Each row of the table is it’s own feature. Because my tables have 3 columns, the lists have 3 values. 


You’ll have to expose those attributes as required and search them using a ListSearcher.

 

Hope that helps.