I have a set of doc files and need to retrieve XY. attaching an example here. the problem is worsened by the fact that almost everything is in cyrillic. I tried to read it as txt and then use StringSearcher but my regex knowledge is probably not sufficient. Any help would be appreciated.Â
Page 1 / 1
Hi,
Â
Â
you can use the following regular expression to decompose your coordinate:Â
Â
N (\\d+)\\D(\\d+)\\D(\\d+)\\DÂ
E (\\d+)\\D(\\d+)\\D(\\d+)\\DÂ
Â
Example:Â
Â
Â
Â
David
Hi David, thanks. the problem is that when I try to read the doc file as txt, the structure is someway corrupted, so the expression you gave me wouldn't work. I can send you the doc file if you could have a look... I did not find a way to attach it here. thanks again!
Hi,
Â
Â
I'm interested in this subject.Â
You can upload the sample file to a server like Dropbox or Google Drive, and paste its shared link URL here. We can then share the file.Â
Â
Takashi
Iijima-san, thanks for your response and interest.. here you go https://drive.google.com/file/d/0B4_4CnzRy4CvWF91M00yVmsxc2c/view?usp=sharing
I was able to retrieve parts of the coordinates with this data flow after converting the Word doc to a plain text.
Â
Â
Resulting feature contains these attributes, you can then convert them to degrees.Â
Attribute(encoded: utf-8) Â Â Â Â : `_E{0}' has value `152'Â
Attribute(encoded: utf-8) Â Â Â Â : `_E{1}' has value `51'Â
Attribute(encoded: utf-8) Â Â Â Â : `_E{2}' has value `715'Â
Attribute(encoded: utf-8) Â Â Â Â : `_N{0}' has value `59'Â
Attribute(encoded: utf-8) Â Â Â Â : `_N{1}' has value `33'Â
Attribute(encoded: utf-8) Â Â Â Â : `_N{2}' has value `067'Â
Â
David's regex worked as expected
Pasted it to text. Save encoding to UNICODE, this will keep the cirilics
Â
Â
Then you can use regexp on it, even if you don't know what the words mean (, you can still search for the word prior to the coordinate sets....Â
Â
This wil get both the coordinates:Â
Â
(NÂ\\s\\d°??]+)|(E°\\s\\d°??]+)Â
Â
Used it in the stringsearcher.Â
Â
Â
But i had to copy the single and double quote from the text. I initialy entered them trought the keyboard, but the regexp only got to the degree mark. CopyPaste it worked.Â
Â
Â
Â
Â
Â
Gio,
Â
Â
you can avoid messing with the weird quotation marks etc by using \\D rather than string literals. The \\D symbol matches any NON-numeric character.Â
Â
David