Hi,
I have hundreds of word documents that i am trying to extract the text from in order to run analysis on the content. I have tried just using the MS word reader to open a .docx file but with not success so have seeked alternative workarounds to achieve this.
One of the common solutions IΒ have come across isΒ running a command for 7-ZipΒ in the system caller to extract to word document into XML then load XML.
My issue is that, although the command runs fine when run directly in CMD, when run through the system caller it requires a response regarding duplicate files.Β
βWould you like to replace the existing file:
Β Path: Β Β .\[Content_Types].xml
Β Size: Β Β 3893 bytes (4 KiB)
Β Modified: 1980-01-01 00:00:00
with the file from archive:
Β Path: Β Β [Content_Types].xml
Β Size: Β Β 3893 bytes (4 KiB)
Β Modified: 1980-01-01 00:00:00
? (Y)es / (N)o / (A)lways / (S)kip all / A(u)to rename all / (Q)uit?β
As mentioned the command prompt works fine in CMD but not through system caller.Β
The Command used is ββ(7-zip.exe file path)β x β(.docx file path)β -oβ(output folder path)ββ
Both the β.docx file pathβ and βoutput folder pathβ are set with attributes. so no two docx should have the same output.Β
Any ideas what may be causing this and how I can resolve the issue.
Thanks for the help in advance.


