Skip to main content

Hi, trying to capture a 'payee' name from text attributes for each feature. The payee name always occurs on the line after 12 spaces. I use some software called Regex Buddy to build and test the regex so I know this works. However, when I plug the regex into StringSearcher in FME and put the same text into the test box, it doesn't work. There's a sample text file attached and the regex I'm using is at the bottom of the file. In this particular file I'm trying to pick up the words Hanover Housing Association. Any ideas anyone??

Thanks

James

(^\\s{12})([a-zA-Z\\s]*)


(^\\s{12})([a-zA-Z\\s]*)

 

Then get the second group

 


In FME 2016.1 the regex ^\\s{12}[A-Za-z ]+$ (note there's a space behind the z) sort of does the trick, but it also matches lines with more than 12 spaces, which the quick reference seems to apply shouldn't happen.

Please note that if we do track down a regex that finds "Hanover Housing Association" it'll also pick up the 3 lines below it, they also match the pattern.


Hi @james_rutter, your expression definitely matches the text, but you need to define subexpression (group)  if you want to extract a part with using StringSearcher. e.g.

^ {12}(eA-Z]/a-z].*)

Then, specify a list name to the "Subexpression Matches List Name parameter", in order to store matched part.

There are many possible expressions, but the AttributeTrimmer might be a simpler way if you just need to remove leading spaces from the input line.


(?<=^\\s{12})([a-zA-Z]{1}[a-zA-Z\\s]*) should match just the text without the spaces at the front, but only where the spaces exist in front.

This will only work in FME 2016 as look behind wasn't supported earlier.

I would then probably use attributecreator with enable adjacent feature attributes to determine the first instance (i'm presuming you're reading with a text reader with each line a separate feature)


 

Then get the second group

 

Because spaces are allowed in the second group, this will match anything with 12 or more spaces in front of it, rather than exactly 12

 

 


Having said that, surely there's bound to be someone who has a company name that includes a digit?

So probably safer to use (?<=^\\s{12})([a-zA-Z0-9]{1}[a-zA-Z0-9\\s]*)


(?<=^\\s{12})([a-zA-Z]{1}[a-zA-Z\\s]*) should match just the text without the spaces at the front, but only where the spaces exist in front.

This will only work in FME 2016 as look behind wasn't supported earlier.

I would then probably use attributecreator with enable adjacent feature attributes to determine the first instance (i'm presuming you're reading with a text reader with each line a separate feature)

Thanks for the suggestion but I can only get a match with this if I type 12 spaces followed by Hanover Housing Association into the sample text box. If I past in the text in the file I uploaded I don't get any match with this regex.

 

 


Thanks for the suggestion but I can only get a match with this if I type 12 spaces followed by Hanover Housing Association into the sample text box. If I past in the text in the file I uploaded I don't get any match with this regex.

 

 

 

You probably need a line feed rather than start of line, e.g. \\n.{12}1a-zA-Z].*

 

 


Thanks for the suggestions here. I took another path to solve my problem. The payee name always occurs on line 10. I used attribute splitter to split the text attributes apart again using the line feed. In the list that's created I exposed element 9 which was the text I needed.


Reply