Skip to main content
Solved

Extract specific text after key words

  • February 28, 2018
  • 3 replies
  • 392 views

ngstoke
Contributor
Forum|alt.badge.img+1

Hello,

I'm attempting to extract strings from a text file that comes after specific key words.

I only need to extract the X,Y coordinates from the text but I need to preserve what values they correspond with. In my example these are well points and I will need to keep the location for each one.

I have the following keywords:

GEOLOGIC FIRST TAKE POINT:

BOTTOM HOLE LOCATION:

LAST TAKE POINT (TIV):

I only need the X,Y coordinates that come after these key words, but I'm having trouble getting my regex expression to pull the X,Y data following each key word as there is text in between I don't need.

I've attached a sample of my data below.

FIRST TAKE POINT /

 

GEOLOGIC FIRST TAKE POINT:

 

WORDS IN BETWEEN WHAT I WANT TO EXTRACT

 

X = 5,498,257'

 

Y = 789,958'

 

LAT

 

LONG

 

LAT

 

LONG

 

(NAD27)

 

(NAD83/86)

 

LEGEND:

 

(PSL) - PROPOSED SURFACE LOCATION

 

(PP) - PENETRATION POINT

 

(FTP) - FIRST TAKE POINT

 

(GFTP) - GEOLOGIC FIRST TAKE POINT

 

(LTP) - LAST TAKE POINT

 

(BHL) - BOTTOM HOLE LOCATION

 

(r) - RADIUS

 

(TIV) - TOE INITIATOR VALVE

 

- APPROXIMATE SURVEY LINE

 

- UNIT LINE

 

- PROPOSED BORE PATH

 

- AS-DRILLED BORE PATH

 

- PROPOSED POINTS

 

- AS-DRILLED POINTS

 

LAST TAKE POINT (TIV):

 

TEXT HERE IN BETWEEN WHAT I WANT

 

X = 1,587,371'

 

Y = 789,445'

 

LAT

 

LONG

 

LAT

 

LONG

 

(NAD27)

 

(NAD27)

 

(NAD83/86)

 

BOTTOM HOLE LOCATION:

 

TEXT HERE

 

X = 1,176,480'

 

Y = 259,265'

 

LAT

 

LONG

 

LAT

 

LONG

Best answer by takashi

Hi @ngstoke, using two StringSearchers might be easy.

0684Q00000ArKUhQAN.png

Regular Expression for the 1st StringSearcher:

(GEOLOGIC FIRST TAKE POINT|BOTTOM HOLE LOCATION|LAST TAKE POINT \(TIV\))\s*:(.*?[XY]\s*=\s*[\d,]+){2}

Regular Expression for the 2nd StringSearcher:

(.+)\s*:.*([XY])\s*=\s*([\d,]+).*([XY])\s*=\s*([\d,]+)

The features output from the 2nd StringSearcher will have a list attribute containing these element.

_sub{0}.part = <a key word>
_sub{1}.part = 'X' (or 'Y')
_sub{2}.part = <a coordinate value>
_sub{3}.part = 'Y' (or 'X')
_sub{4}.part = <a coordinate value>

You can then map them to your desired destination schema with some additional transformers. e.g.

0684Q00000ArKgvQAF.png

View original
Did this help you find an answer to your question?

3 replies

jdh
Contributor
Forum|alt.badge.img+28
  • Contributor
  • February 28, 2018

If the text between the keywords and location values are always the same amount of lines, you can use adjacent features as in the attached workspace. takepoint.fmw (Ignore the attributeExposer, that's just dealing with a GUI bugin workspace, if you take it out, you get the same results, but the AttributeCreator shows a parameter error.)

 

If the amount of text varies, you can use a variable setter/retriever method as shown in https://knowledge.safe.com/questions/3346/translating-a-polygon-from-poly-osmosis-polygon-fo.html where your initial tester checks for your keywords.


takashi
Influencer
  • Best Answer
  • March 1, 2018

Hi @ngstoke, using two StringSearchers might be easy.

0684Q00000ArKUhQAN.png

Regular Expression for the 1st StringSearcher:

(GEOLOGIC FIRST TAKE POINT|BOTTOM HOLE LOCATION|LAST TAKE POINT \(TIV\))\s*:(.*?[XY]\s*=\s*[\d,]+){2}

Regular Expression for the 2nd StringSearcher:

(.+)\s*:.*([XY])\s*=\s*([\d,]+).*([XY])\s*=\s*([\d,]+)

The features output from the 2nd StringSearcher will have a list attribute containing these element.

_sub{0}.part = <a key word>
_sub{1}.part = 'X' (or 'Y')
_sub{2}.part = <a coordinate value>
_sub{3}.part = 'Y' (or 'X')
_sub{4}.part = <a coordinate value>

You can then map them to your desired destination schema with some additional transformers. e.g.

0684Q00000ArKgvQAF.png


ngstoke
Contributor
Forum|alt.badge.img+1
  • Author
  • Contributor
  • March 1, 2018
takashi wrote:

Hi @ngstoke, using two StringSearchers might be easy.

0684Q00000ArKUhQAN.png

Regular Expression for the 1st StringSearcher:

(GEOLOGIC FIRST TAKE POINT|BOTTOM HOLE LOCATION|LAST TAKE POINT \(TIV\))\s*:(.*?[XY]\s*=\s*[\d,]+){2}

Regular Expression for the 2nd StringSearcher:

(.+)\s*:.*([XY])\s*=\s*([\d,]+).*([XY])\s*=\s*([\d,]+)

The features output from the 2nd StringSearcher will have a list attribute containing these element.

_sub{0}.part = <a key word>
_sub{1}.part = 'X' (or 'Y')
_sub{2}.part = <a coordinate value>
_sub{3}.part = 'Y' (or 'X')
_sub{4}.part = <a coordinate value>

You can then map them to your desired destination schema with some additional transformers. e.g.

0684Q00000ArKgvQAF.png

@takashi  This worked. I applicate the detailed explanation, thanks for your help!

Cookie policy

We use cookies to enhance and personalize your experience. If you accept you agree to our full cookie policy. Learn more about our cookies.

 
Cookie settings