Skip to main content
Question

retrieve XY from set of ms word

  • March 16, 2015
  • 7 replies
  • 20 views

Forum|alt.badge.img
I have a set of doc files and need to retrieve XY. attaching an example here. the problem is worsened by the fact that almost everything is in cyrillic. I tried to read it as txt and then use StringSearcher but my regex knowledge is probably not sufficient. Any help would be appreciated. 
This post is closed to further activity.
It may be an old question, an answered question, an implemented idea, or a notification-only post.
Please check post dates before relying on any information in a question or answer.
For follow-up or related questions, please post a new question or idea.
If there is a genuine update to be made, please contact us and request that the post is reopened.

7 replies

david_r
Celebrity
  • March 16, 2015
Hi,

 

 

you can use the following regular expression to decompose your coordinate:

 

 

N (\\d+)\\D(\\d+)\\D(\\d+)\\D

 

E (\\d+)\\D(\\d+)\\D(\\d+)\\D

 

 

Example:

 

 

 

 

David

Forum|alt.badge.img
  • Author
  • March 16, 2015
Hi David, thanks. the problem is that when I try to read the doc file as txt, the structure is someway corrupted, so the expression you gave me wouldn't work. I can send you the doc file if you could have a look... I did not find a way to attach it here. thanks again!

takashi
Celebrity
  • March 17, 2015
Hi,

 

 

I'm interested in this subject.

 

You can upload the sample file to a server like Dropbox or Google Drive, and paste its shared link URL here. We can then share the file.

 

 

Takashi

Forum|alt.badge.img
  • Author
  • March 17, 2015
Iijima-san, thanks for your response and interest.. here you go https://drive.google.com/file/d/0B4_4CnzRy4CvWF91M00yVmsxc2c/view?usp=sharing

takashi
Celebrity
  • March 17, 2015
I was able to retrieve parts of the coordinates with this data flow after converting the Word doc to a plain text.

 

 

Resulting feature contains these attributes, you can then convert them to degrees.

 

Attribute(encoded: utf-8)         : `_E{0}' has value `152'

 

Attribute(encoded: utf-8)         : `_E{1}' has value `51'

 

Attribute(encoded: utf-8)         : `_E{2}' has value `715'

 

Attribute(encoded: utf-8)         : `_N{0}' has value `59'

 

Attribute(encoded: utf-8)         : `_N{1}' has value `33'

 

Attribute(encoded: utf-8)         : `_N{2}' has value `067'

 

 

David's regex worked as expected :)

gio
Contributor
Forum|alt.badge.img+15
  • Contributor
  • March 17, 2015
Pasted it to text. Save encoding to UNICODE, this will keep the cirilics

 

 

Then you can use regexp on it, even if you don't know what the words mean (, you can still search for the word prior to the coordinate sets....

 

 

This wil get both the coordinates:

 

 

(N[\\s\\d°??]+)|(E[\\s\\d°??]+)

 

 

Used it in the stringsearcher.

 

 

 

But i had to copy the single and double quote from the text. I initialy entered them trought the keyboard, but the regexp only got to the degree mark. CopyPaste it worked.

 

 

 

 

 

 

david_r
Celebrity
  • March 17, 2015
Gio,

 

 

you can avoid messing with the weird quotation marks etc by using \\D rather than string literals. The \\D symbol matches any NON-numeric character.

 

 

David