Hi,
I have hundreds of word documents that i am trying to extract the text from in order to run analysis on the content. I have tried just using the MS word reader to open a .docx file but with not success so have seeked alternative workarounds to achieve this.
One of the common solutions I have come across is running a command for 7-Zip in the system caller to extract to word document into XML then load XML.
My issue is that, although the command runs fine when run directly in CMD, when run through the system caller it requires a response regarding duplicate files.
“Would you like to replace the existing file:
Path: .\[Content_Types].xml
Size: 3893 bytes (4 KiB)
Modified: 1980-01-01 00:00:00
with the file from archive:
Path: [Content_Types].xml
Size: 3893 bytes (4 KiB)
Modified: 1980-01-01 00:00:00
? (Y)es / (N)o / (A)lways / (S)kip all / A(u)to rename all / (Q)uit?“
As mentioned the command prompt works fine in CMD but not through system caller.
The Command used is ““(7-zip.exe file path)” x “(.docx file path)” -o”(output folder path)””
Both the ‘.docx file path’ and ‘output folder path’ are set with attributes. so no two docx should have the same output.
Any ideas what may be causing this and how I can resolve the issue.
Thanks for the help in advance.