Skip to main content

Hi,

I have hundreds of word documents that i am trying to extract the text from in order to run analysis on the content. I have tried just using the MS word reader to open a .docx file but with no success, so have seeked alternative workarounds to achieve this.

One of the solutions I am looking at is to run ‘Mammoth .docx to HTML converter’ via SystemCaller to extract to word document into html.

My issue is that, although the command runs fine when run directly in CMD, when run through the SystemCaller in FME it does not create the output file.

from the log file I can see it finds the input .docx and extract the information it just doesn't write the output .html for me to read into FME with the AttributeFileReader. I have installed Python, Mammoth, and Node.js on my PC, below is the command used in SystemCaller;

"mammoth "@Value(Input File Path)" "@Value(Output File Path)"

Any help with this would be appreciated.

Many Thanks

 

Mike

I think your problem here might come from @Value(Input File Path) or @Value(Output File Path)

Could you provide the complete command you’re using ?


Hi Alex, the command in system caller is;

"mammoth "@Value(Input File Path)" "@Value(Output File Path)"

 

I have put that into a Attribute Creator to see the actual file paths. they come out as this

"mammoth "C:\Users\michael.france\Desktop\FME QA\02_06-05-25 - QA\02_QA DIrectory\00000_Sample Project\01. Project Information\05. PQD\00000-SVP-QA-XX-T-R-0001-P01 - Project QA.docx

" "C:\Users\michael.france\Desktop\FME QA\02_06-05-25 - QA\02_QA DIrectory\00000_Sample Project\01. Project Information\05. PQD\00000-SVP-QA-XX-T-R-0001-P01 - Project QA.html

"

FME seems to be adding new lines. not sure if this would cause the issue. 


Yeah, you might want to trim off those newlines and clean your attributes before using them in SystemCaller.


Thanks Alex, that did the trick. Just had to run the complete command through StringReplacer and replace regular expression \v to nothing to remove vertical tabs. All running fine now.


Reply