Skip to main content
Solved

Running Mammoth .docx to HTML with SystemCaller


michaelfrance
Contributor
Forum|alt.badge.img+1

Hi,

I have hundreds of word documents that i am trying to extract the text from in order to run analysis on the content. I have tried just using the MS word reader to open a .docx file but with no success, so have seeked alternative workarounds to achieve this.

One of the solutions I am looking at is to run ‘Mammoth .docx to HTML converter’ via SystemCaller to extract to word document into html.

My issue is that, although the command runs fine when run directly in CMD, when run through the SystemCaller in FME it does not create the output file.

from the log file I can see it finds the input .docx and extract the information it just doesn't write the output .html for me to read into FME with the AttributeFileReader. I have installed Python, Mammoth, and Node.js on my PC, below is the command used in SystemCaller;

"mammoth "@Value(Input File Path)" "@Value(Output File Path)"

Any help with this would be appreciated.

Many Thanks

 

Mike

Best answer by alexbiz

Yeah, you might want to trim off those newlines and clean your attributes before using them in SystemCaller.

View original
Did this help you find an answer to your question?
This post is closed to further activity.
It may be a question with a best answer, an implemented idea, or just a post needing no comment.
If you have a follow-up or related question, please post a new question or idea.
If there is a genuine update to be made, please contact us and request that the post is reopened.

4 replies

alexbiz
Influencer
Forum|alt.badge.img+25
  • Influencer
  • June 18, 2025

I think your problem here might come from @Value(Input File Path) or @Value(Output File Path)

Could you provide the complete command you’re using ?


michaelfrance
Contributor
Forum|alt.badge.img+1
  • Author
  • Contributor
  • June 18, 2025

Hi Alex, the command in system caller is;

"mammoth "@Value(Input File Path)" "@Value(Output File Path)"

 

I have put that into a Attribute Creator to see the actual file paths. they come out as this

"mammoth "C:\Users\michael.france\Desktop\FME QA\02_06-05-25 - QA\02_QA DIrectory\00000_Sample Project\01. Project Information\05. PQD\00000-SVP-QA-XX-T-R-0001-P01 - Project QA.docx

" "C:\Users\michael.france\Desktop\FME QA\02_06-05-25 - QA\02_QA DIrectory\00000_Sample Project\01. Project Information\05. PQD\00000-SVP-QA-XX-T-R-0001-P01 - Project QA.html

"

FME seems to be adding new lines. not sure if this would cause the issue. 


alexbiz
Influencer
Forum|alt.badge.img+25
  • Influencer
  • Best Answer
  • June 18, 2025

Yeah, you might want to trim off those newlines and clean your attributes before using them in SystemCaller.


michaelfrance
Contributor
Forum|alt.badge.img+1
  • Author
  • Contributor
  • June 19, 2025

Thanks Alex, that did the trick. Just had to run the complete command through StringReplacer and replace regular expression \v to nothing to remove vertical tabs. All running fine now.


Cookie policy

We use cookies to enhance and personalize your experience. If you accept you agree to our full cookie policy. Learn more about our cookies.

 
Cookie settings