Skip to main content
Solved

Regular Expression works in my regex software but not FME?


Forum|alt.badge.img+4

Hi, trying to capture a 'payee' name from text attributes for each feature. The payee name always occurs on the line after 12 spaces. I use some software called Regex Buddy to build and test the regex so I know this works. However, when I plug the regex into StringSearcher in FME and put the same text into the test box, it doesn't work. There's a sample text file attached and the regex I'm using is at the bottom of the file. In this particular file I'm trying to pick up the words Hanover Housing Association. Any ideas anyone??

Thanks

James

Best answer by james_rutter

Thanks for the suggestions here. I took another path to solve my problem. The payee name always occurs on line 10. I used attribute splitter to split the text attributes apart again using the line feed. In the list that's created I exposed element 9 which was the text I needed.

View original
Did this help you find an answer to your question?

10 replies

bruceharold
Contributor
Forum|alt.badge.img+17
  • Contributor
  • August 16, 2016

(^\\s{12})([a-zA-Z\\s]*)


bruceharold
Contributor
Forum|alt.badge.img+17
  • Contributor
  • August 16, 2016
bruceharold wrote:

(^\\s{12})([a-zA-Z\\s]*)

 

Then get the second group

 


redgeographics
Celebrity
Forum|alt.badge.img+49

In FME 2016.1 the regex ^\\s{12}[A-Za-z ]+$ (note there's a space behind the z) sort of does the trick, but it also matches lines with more than 12 spaces, which the quick reference seems to apply shouldn't happen.

Please note that if we do track down a regex that finds "Hanover Housing Association" it'll also pick up the 3 lines below it, they also match the pattern.


takashi
Influencer
  • August 16, 2016

Hi @james_rutter, your expression definitely matches the text, but you need to define subexpression (group)  if you want to extract a part with using StringSearcher. e.g.

^ {12}([A-Z][a-z].*)

Then, specify a list name to the "Subexpression Matches List Name parameter", in order to store matched part.

There are many possible expressions, but the AttributeTrimmer might be a simpler way if you just need to remove leading spaces from the input line.


ebygomm
Influencer
Forum|alt.badge.img+32
  • Influencer
  • August 16, 2016

(?<=^\\s{12})([a-zA-Z]{1}[a-zA-Z\\s]*) should match just the text without the spaces at the front, but only where the spaces exist in front.

This will only work in FME 2016 as look behind wasn't supported earlier.

I would then probably use attributecreator with enable adjacent feature attributes to determine the first instance (i'm presuming you're reading with a text reader with each line a separate feature)


ebygomm
Influencer
Forum|alt.badge.img+32
  • Influencer
  • August 16, 2016
bruceharold wrote:

 

Then get the second group

 

Because spaces are allowed in the second group, this will match anything with 12 or more spaces in front of it, rather than exactly 12

 

 


ebygomm
Influencer
Forum|alt.badge.img+32
  • Influencer
  • August 16, 2016

Having said that, surely there's bound to be someone who has a company name that includes a digit?

So probably safer to use (?<=^\\s{12})([a-zA-Z0-9]{1}[a-zA-Z0-9\\s]*)


Forum|alt.badge.img+4
ebygomm wrote:

(?<=^\\s{12})([a-zA-Z]{1}[a-zA-Z\\s]*) should match just the text without the spaces at the front, but only where the spaces exist in front.

This will only work in FME 2016 as look behind wasn't supported earlier.

I would then probably use attributecreator with enable adjacent feature attributes to determine the first instance (i'm presuming you're reading with a text reader with each line a separate feature)

Thanks for the suggestion but I can only get a match with this if I type 12 spaces followed by Hanover Housing Association into the sample text box. If I past in the text in the file I uploaded I don't get any match with this regex.

 

 


ebygomm
Influencer
Forum|alt.badge.img+32
  • Influencer
  • August 16, 2016
james_rutter wrote:
Thanks for the suggestion but I can only get a match with this if I type 12 spaces followed by Hanover Housing Association into the sample text box. If I past in the text in the file I uploaded I don't get any match with this regex.

 

 

 

You probably need a line feed rather than start of line, e.g. \\n.{12}[a-zA-Z].*

 

 


Forum|alt.badge.img+4
  • Author
  • Best Answer
  • August 16, 2016

Thanks for the suggestions here. I took another path to solve my problem. The payee name always occurs on line 10. I used attribute splitter to split the text attributes apart again using the line feed. In the list that's created I exposed element 9 which was the text I needed.


Reply


Cookie policy

We use cookies to enhance and personalize your experience. If you accept you agree to our full cookie policy. Learn more about our cookies.

 
Cookie settings