Solved

How can I create a regex with a lookahead after a certain text?

  • 1 December 2022
  • 7 replies
  • 1 view

Badge +8

I've got an issue with the stringsearcher. It does not seen to work as I expect it do do. It might be my regex skills, but I think I tried all possibilities.

What

 

I've got the following text: "324324" "maatvoering; hoogte; bouwhoogte; maximum bouwhoogte (m)"="8", "maatvoering; hoogte; goothoogte; maximum goothoogte (m)"="4" asdfasdf sadfsadfsf 343244

 

In this text I would like to find the first number after it finds the word 'goot' in the text. In this example this should be the bold number 4. I'm trying to do this with the following expression: (?=goot.*")[0-9]*(?=")

 

But it does not give any results as you can see in the screenshot below.

image 

Any ideas what I'm missing here?

 

icon

Best answer by geomancer 1 December 2022, 13:54

View original

7 replies

Userlevel 4
Badge +36

Regex always provides nice puzzles 😀 

.*goot.*"="\K\d+

Regex_goothoogteAccording to https://perldoc.perl.org/perlre#Regular-Expressions, \K means 'Keep the stuff left of the \K, don't include it in $&', so it is not included in the match.

Badge +8

Regex always provides nice puzzles 😀 

.*goot.*"="\K\d+

Regex_goothoogteAccording to https://perldoc.perl.org/perlre#Regular-Expressions, \K means 'Keep the stuff left of the \K, don't include it in $&', so it is not included in the match.

Thanks for your quick reply @geomancer​ . That is a completely different approach compared to mine. It works partly, but not perfect yet. For example: if I would replace the 'goot' with 'bouw', I would expect that it returns the value 8. But instead it still returns the value 4. So at the moment it seems to return the last number it can find, instead of the first number. Any idea how to fix this?

Userlevel 4
Badge +36

Ah, I didn't test that. This works for both 'bouw' and 'goot':

.*bouw[\D]*\K\d+

 

Userlevel 3
Badge +26

This could also work, in case the "=" might not be present.

goot(?![\s\S]*goot)\K(\D*)\K\d+
  • goot(?![\s\S]*goot) 
    • Returns the last goot in the string
  • \K(\D*)
    • Returns the non-digit text after the last goot
  • \K\d+
    • Returns the digits after the non-digit text (after the last goot 😁 )
Badge +8

Ah, I didn't test that. This works for both 'bouw' and 'goot':

.*bouw[\D]*\K\d+

 

This works perfectly. Many thanks for your quick reply again!

Badge +8

This could also work, in case the "=" might not be present.

goot(?![\s\S]*goot)\K(\D*)\K\d+
  • goot(?![\s\S]*goot) 
    • Returns the last goot in the string
  • \K(\D*)
    • Returns the non-digit text after the last goot
  • \K\d+
    • Returns the digits after the non-digit text (after the last goot 😁 )

Many thanks for your input @dustin​ . This regex seems some kind of rocket science :-).  I also tried this solution, but it seems to return the last digit in all cases where the text contains the word goot. But in the mean time geomancer came with the perfect solution, so I'll go for that one. 

Userlevel 3
Badge +26

Many thanks for your input @dustin​ . This regex seems some kind of rocket science :-). I also tried this solution, but it seems to return the last digit in all cases where the text contains the word goot. But in the mean time geomancer came with the perfect solution, so I'll go for that one.

Regex can be that way sometimes. 😂 Glad you found the solution. 😎

Reply