Solved

Extract specific text after key words

  • 28 February 2018
  • 3 replies
  • 69 views

Badge

Hello,

I'm attempting to extract strings from a text file that comes after specific key words.

I only need to extract the X,Y coordinates from the text but I need to preserve what values they correspond with. In my example these are well points and I will need to keep the location for each one.

I have the following keywords:

GEOLOGIC FIRST TAKE POINT:

BOTTOM HOLE LOCATION:

LAST TAKE POINT (TIV):

I only need the X,Y coordinates that come after these key words, but I'm having trouble getting my regex expression to pull the X,Y data following each key word as there is text in between I don't need.

I've attached a sample of my data below.

FIRST TAKE POINT /

 

GEOLOGIC FIRST TAKE POINT:

 

WORDS IN BETWEEN WHAT I WANT TO EXTRACT

 

X = 5,498,257'

 

Y = 789,958'

 

LAT

 

LONG

 

LAT

 

LONG

 

(NAD27)

 

(NAD83/86)

 

LEGEND:

 

(PSL) - PROPOSED SURFACE LOCATION

 

(PP) - PENETRATION POINT

 

(FTP) - FIRST TAKE POINT

 

(GFTP) - GEOLOGIC FIRST TAKE POINT

 

(LTP) - LAST TAKE POINT

 

(BHL) - BOTTOM HOLE LOCATION

 

(r) - RADIUS

 

(TIV) - TOE INITIATOR VALVE

 

- APPROXIMATE SURVEY LINE

 

- UNIT LINE

 

- PROPOSED BORE PATH

 

- AS-DRILLED BORE PATH

 

- PROPOSED POINTS

 

- AS-DRILLED POINTS

 

LAST TAKE POINT (TIV):

 

TEXT HERE IN BETWEEN WHAT I WANT

 

X = 1,587,371'

 

Y = 789,445'

 

LAT

 

LONG

 

LAT

 

LONG

 

(NAD27)

 

(NAD27)

 

(NAD83/86)

 

BOTTOM HOLE LOCATION:

 

TEXT HERE

 

X = 1,176,480'

 

Y = 259,265'

 

LAT

 

LONG

 

LAT

 

LONG

icon

Best answer by takashi 1 March 2018, 03:47

View original

3 replies

Badge +22

If the text between the keywords and location values are always the same amount of lines, you can use adjacent features as in the attached workspace. takepoint.fmw (Ignore the attributeExposer, that's just dealing with a GUI bugin workspace, if you take it out, you get the same results, but the AttributeCreator shows a parameter error.)

 

If the amount of text varies, you can use a variable setter/retriever method as shown in https://knowledge.safe.com/questions/3346/translating-a-polygon-from-poly-osmosis-polygon-fo.html where your initial tester checks for your keywords.

Userlevel 2
Badge +17

Hi @ngstoke, using two StringSearchers might be easy.

0684Q00000ArKUhQAN.png

Regular Expression for the 1st StringSearcher:

(GEOLOGIC FIRST TAKE POINT|BOTTOM HOLE LOCATION|LAST TAKE POINT \(TIV\))\s*:(.*?[XY]\s*=\s*[\d,]+){2}

Regular Expression for the 2nd StringSearcher:

(.+)\s*:.*([XY])\s*=\s*([\d,]+).*([XY])\s*=\s*([\d,]+)

The features output from the 2nd StringSearcher will have a list attribute containing these element.

_sub{0}.part = <a key word>
_sub{1}.part = 'X' (or 'Y')
_sub{2}.part = <a coordinate value>
_sub{3}.part = 'Y' (or 'X')
_sub{4}.part = <a coordinate value>

You can then map them to your desired destination schema with some additional transformers. e.g.

0684Q00000ArKgvQAF.png

Badge

Hi @ngstoke, using two StringSearchers might be easy.

0684Q00000ArKUhQAN.png

Regular Expression for the 1st StringSearcher:

(GEOLOGIC FIRST TAKE POINT|BOTTOM HOLE LOCATION|LAST TAKE POINT \(TIV\))\s*:(.*?[XY]\s*=\s*[\d,]+){2}

Regular Expression for the 2nd StringSearcher:

(.+)\s*:.*([XY])\s*=\s*([\d,]+).*([XY])\s*=\s*([\d,]+)

The features output from the 2nd StringSearcher will have a list attribute containing these element.

_sub{0}.part = <a key word>
_sub{1}.part = 'X' (or 'Y')
_sub{2}.part = <a coordinate value>
_sub{3}.part = 'Y' (or 'X')
_sub{4}.part = <a coordinate value>

You can then map them to your desired destination schema with some additional transformers. e.g.

0684Q00000ArKgvQAF.png

@takashi  This worked. I applicate the detailed explanation, thanks for your help!

Reply