Skip to main content
Solved

RegEx help - separate road numbers from road names


tim_wood
Contributor
Forum|alt.badge.img+8

I've written a regular expression for StringSearcher to separate features with road names (e.g. "Acacia Avenue") from features with road numbers (e.g. "A246", "B3276", "A3(M)"). Road numbers start with a single letter followed by a 1-4+ digit number and sometimes suffixed with further characters such as "(M)".

However my original RegEx mistakenly treats values like "Service Road 2" and "14th Avenue" as road numbers when they should be treated as road names. I've updated the RegEx to this: ^[a-zA-Z]{1}[\\d]{1,}.{0,}$ but it's still not filtering correctly. If I put square brackets round the dot (any single character), it correctly gets "A246" but misses "A3(M)".

Best answer by ebygomm

I'd use the following to match only classified UK road numbers

^[A-Z][0-9]{1,5}.*
View original
Did this help you find an answer to your question?

15 replies

david_r
Evangelist
  • October 17, 2018

How about

^\w\d{1,3}[\(\w*\)]?$


tim_wood
Contributor
Forum|alt.badge.img+8
  • Author
  • Contributor
  • October 17, 2018
david_r wrote:

How about

^\w\d{1,3}[\(\w*\)]?$

Brilliant.  Thank you so much :-)

 


tim_wood
Contributor
Forum|alt.badge.img+8
  • Author
  • Contributor
  • October 17, 2018
david_r wrote:

How about

^\w\d{1,3}[\(\w*\)]?$

Ah. It's not correctly filtering"A3" as a road number.  A single letter followed by 2 or more digits is fine though.

 

 

Update: done a bit more trial and error and this seems to work: ^\w[\(\w*\)]+$

 

Presumably the numbers count as part of the word at the start if there's no space after the letter...?

 

 

I think the answer may be to look for values without a space in but I'll have to check the data again.

 


tim_wood
Contributor
Forum|alt.badge.img+8
  • Author
  • Contributor
  • October 17, 2018
david_r wrote:

How about

^\w\d{1,3}[\(\w*\)]?$

D'oh.  There are single word road names - no spaces...

 


david_r
Evangelist
  • October 17, 2018
tim_wood wrote:
Ah. It's not correctly filtering"A3" as a road number. A single letter followed by 2 or more digits is fine though.

 

 

Update: done a bit more trial and error and this seems to work: ^\\w[\\(\\w*\\)]+$

 

Presumably the numbers count as part of the word at the start if there's no space after the letter...?

 

 

I think the answer may be to look for values without a space in but I'll have to check the data again.

 

Ah, yes I used the wrong quantifier for the paranthesis. Replace the last "+" with a "?" and it should work. I've updated my answer accordingly.

david_r
Evangelist
  • October 17, 2018
tim_wood wrote:
D'oh. There are single word road names - no spaces...

 

Should work as well now. If not, can you give an example that doesn't match?

tim_wood
Contributor
Forum|alt.badge.img+8
  • Author
  • Contributor
  • October 17, 2018
david_r wrote:
Should work as well now. If not, can you give an example that doesn't match?
That matches the first 2 in the "Do match" list below but not the rest. If I change {1,3} to {1,6} or {1,} I get everything except the ones with "(M)" in. However, I think this works: ^\\w\\d{1,}[\\(\\w*\\)]{0,}$

 

I don't think the "B67856x" example should exist but if it does, it will be filtered correctly. There's still the possibility that the data will throw up other examples that RegEx doesn't process correctly but that's data for you...

 

 

Do match: A3

 

A246

 

C765

 

D98765

 

A3(M)

 

A23524657(M)

 

B67856x

 

Don't match:

 

Piccadily

 

14th Avenue

 

Service Road 2

 

Acacia Avenue

 


ebygomm
Influencer
Forum|alt.badge.img+31
  • Influencer
  • Best Answer
  • October 17, 2018

I'd use the following to match only classified UK road numbers

^[A-Z][0-9]{1,5}.*

david_r
Evangelist
  • October 17, 2018

This one should work according to your specification:

^\w\d+((\(\w*\))|\w)?$

Matches:

A3

 

A246

 

C765

 

D98765

 

A3(M)

 

A23524657(M)

 

B67856x

Doesn't match:

Piccadily

 

14th Avenue

 

Service Road 2

 

Acacia Avenue

david_r
Evangelist
  • October 17, 2018
ebygomm wrote:

I'd use the following to match only classified UK road numbers

^[A-Z][0-9]{1,5}.*
Very nice. Matches exactly the same as my much more complex regex. Goes to show that local knowledge goes a long way!

tim_wood
Contributor
Forum|alt.badge.img+8
  • Author
  • Contributor
  • October 17, 2018
ebygomm wrote:

I'd use the following to match only classified UK road numbers

^[A-Z][0-9]{1,5}.*
OK.  I think that works but I'm getting slightly confused by the Test String box in StringSearcher.  Even if I put an end of line $ in my RegEx, it still matches Acacia Avenue in a list which includes values I do want to match.  But Acacia Avenue its own doesn't match which is correct.  It's a bit annoying as I though my carriage returns/line feeds in the Test String box would match the end of line $ but they don't seem to.  That means I can't use a list for testing and have to delete and type all my test values separately.

 

0684Q00000ArMoNQAV.png

0684Q00000ArMieQAF.png


david_r
Evangelist
  • October 17, 2018
tim_wood wrote:
OK. I think that works but I'm getting slightly confused by the Test String box in StringSearcher. Even if I put an end of line $ in my RegEx, it still matches Acacia Avenue in a list which includes values I do want to match. But Acacia Avenue its own doesn't match which is correct. It's a bit annoying as I though my carriage returns/line feeds in the Test String box would match the end of line $ but they don't seem to. That means I can't use a list for testing and have to delete and type all my test values separately.

 

I'm guessing that's a limitation of the test string box in FME, it doesn't seem to allow you to let ^$ match linebreaks. You may want to consider something more powerful when working on more complex expressions.

 

Personally I user Regex Buddy, it's fantastic but you have to pay for it. I'm sure there are free alternatives, though.

takashi
Contributor
Forum|alt.badge.img+21
  • Contributor
  • October 17, 2018
ebygomm wrote:

I'd use the following to match only classified UK road numbers

^[A-Z][0-9]{1,5}.*
Try this one.

 

^[A-Z][0-9]{1,5}.*?$

 


ebygomm
Influencer
Forum|alt.badge.img+31
  • Influencer
  • October 18, 2018
david_r wrote:
I'm guessing that's a limitation of the test string box in FME, it doesn't seem to allow you to let ^$ match linebreaks. You may want to consider something more powerful when working on more complex expressions.

 

Personally I user Regex Buddy, it's fantastic but you have to pay for it. I'm sure there are free alternatives, though.
Yes, it's a limitation of the test string box. I tend to use rubular for checking these sorts of things

 

 

 

Edit: limitation isn't probably quite the right word, the FME test string box is working correctly if you were sending it a block of text spread over multiple lines in a single attribute. But in this scenario where your list is actually representing multiple feature types with each line being the value of an attribute in different features it can be confusing.

 


takashi
Contributor
Forum|alt.badge.img+21
  • Contributor
  • October 18, 2018
david_r wrote:
I'm guessing that's a limitation of the test string box in FME, it doesn't seem to allow you to let ^$ match linebreaks. You may want to consider something more powerful when working on more complex expressions.

 

Personally I user Regex Buddy, it's fantastic but you have to pay for it. I'm sure there are free alternatives, though.
No, in the Ruby regex specification, the meta character '.' (dot) doesn't match new line character by default, unlike regex in FME. If you add the option 'm' to the regex, the result would be the same as FME regex.

 

 

And the solution is:

 

 

 


Reply


Cookie policy

We use cookies to enhance and personalize your experience. If you accept you agree to our full cookie policy. Learn more about our cookies.

 
Cookie settings