Solved

RegEx help - separate road numbers from road names


Badge +7

I've written a regular expression for StringSearcher to separate features with road names (e.g. "Acacia Avenue") from features with road numbers (e.g. "A246", "B3276", "A3(M)"). Road numbers start with a single letter followed by a 1-4+ digit number and sometimes suffixed with further characters such as "(M)".

However my original RegEx mistakenly treats values like "Service Road 2" and "14th Avenue" as road numbers when they should be treated as road names. I've updated the RegEx to this: ^[a-zA-Z]{1}[\\d]{1,}.{0,}$ but it's still not filtering correctly. If I put square brackets round the dot (any single character), it correctly gets "A246" but misses "A3(M)".

icon

Best answer by ebygomm 17 October 2018, 13:49

View original

15 replies

Userlevel 4

How about

^\w\d{1,3}[\(\w*\)]?$

Badge +7

How about

^\w\d{1,3}[\(\w*\)]?$

Brilliant.  Thank you so much :-)

 

Badge +7

How about

^\w\d{1,3}[\(\w*\)]?$

Ah. It's not correctly filtering"A3" as a road number.  A single letter followed by 2 or more digits is fine though.

 

 

Update: done a bit more trial and error and this seems to work: ^\w[\(\w*\)]+$

 

Presumably the numbers count as part of the word at the start if there's no space after the letter...?

 

 

I think the answer may be to look for values without a space in but I'll have to check the data again.

 

Badge +7

How about

^\w\d{1,3}[\(\w*\)]?$

D'oh.  There are single word road names - no spaces...

 

Userlevel 4
Ah. It's not correctly filtering"A3" as a road number. A single letter followed by 2 or more digits is fine though.

 

 

Update: done a bit more trial and error and this seems to work: ^\\w[\\(\\w*\\)]+$

 

Presumably the numbers count as part of the word at the start if there's no space after the letter...?

 

 

I think the answer may be to look for values without a space in but I'll have to check the data again.

 

Ah, yes I used the wrong quantifier for the paranthesis. Replace the last "+" with a "?" and it should work. I've updated my answer accordingly.
Userlevel 4
D'oh. There are single word road names - no spaces...

 

Should work as well now. If not, can you give an example that doesn't match?
Badge +7
Should work as well now. If not, can you give an example that doesn't match?
That matches the first 2 in the "Do match" list below but not the rest. If I change {1,3} to {1,6} or {1,} I get everything except the ones with "(M)" in. However, I think this works: ^\\w\\d{1,}[\\(\\w*\\)]{0,}$

 

I don't think the "B67856x" example should exist but if it does, it will be filtered correctly. There's still the possibility that the data will throw up other examples that RegEx doesn't process correctly but that's data for you...

 

 

Do match: A3

 

A246

 

C765

 

D98765

 

A3(M)

 

A23524657(M)

 

B67856x

 

Don't match:

 

Piccadily

 

14th Avenue

 

Service Road 2

 

Acacia Avenue

 

Userlevel 1
Badge +21

I'd use the following to match only classified UK road numbers

^[A-Z][0-9]{1,5}.*
Userlevel 4

This one should work according to your specification:

^\w\d+((\(\w*\))|\w)?$

Matches:

A3

 

A246

 

C765

 

D98765

 

A3(M)

 

A23524657(M)

 

B67856x

Doesn't match:

Piccadily

 

14th Avenue

 

Service Road 2

 

Acacia Avenue
Userlevel 4

I'd use the following to match only classified UK road numbers

^[A-Z][0-9]{1,5}.*
Very nice. Matches exactly the same as my much more complex regex. Goes to show that local knowledge goes a long way!
Badge +7

I'd use the following to match only classified UK road numbers

^[A-Z][0-9]{1,5}.*
OK.  I think that works but I'm getting slightly confused by the Test String box in StringSearcher.  Even if I put an end of line $ in my RegEx, it still matches Acacia Avenue in a list which includes values I do want to match.  But Acacia Avenue its own doesn't match which is correct.  It's a bit annoying as I though my carriage returns/line feeds in the Test String box would match the end of line $ but they don't seem to.  That means I can't use a list for testing and have to delete and type all my test values separately.

 

0684Q00000ArMoNQAV.png

0684Q00000ArMieQAF.png

Userlevel 4
OK. I think that works but I'm getting slightly confused by the Test String box in StringSearcher. Even if I put an end of line $ in my RegEx, it still matches Acacia Avenue in a list which includes values I do want to match. But Acacia Avenue its own doesn't match which is correct. It's a bit annoying as I though my carriage returns/line feeds in the Test String box would match the end of line $ but they don't seem to. That means I can't use a list for testing and have to delete and type all my test values separately.

 

I'm guessing that's a limitation of the test string box in FME, it doesn't seem to allow you to let ^$ match linebreaks. You may want to consider something more powerful when working on more complex expressions.

 

Personally I user Regex Buddy, it's fantastic but you have to pay for it. I'm sure there are free alternatives, though.
Userlevel 2
Badge +17

I'd use the following to match only classified UK road numbers

^[A-Z][0-9]{1,5}.*
Try this one.

 

^[A-Z][0-9]{1,5}.*?$

 

Userlevel 1
Badge +21
I'm guessing that's a limitation of the test string box in FME, it doesn't seem to allow you to let ^$ match linebreaks. You may want to consider something more powerful when working on more complex expressions.

 

Personally I user Regex Buddy, it's fantastic but you have to pay for it. I'm sure there are free alternatives, though.
Yes, it's a limitation of the test string box. I tend to use rubular for checking these sorts of things

 

 

 

Edit: limitation isn't probably quite the right word, the FME test string box is working correctly if you were sending it a block of text spread over multiple lines in a single attribute. But in this scenario where your list is actually representing multiple feature types with each line being the value of an attribute in different features it can be confusing.

 

Userlevel 2
Badge +17
I'm guessing that's a limitation of the test string box in FME, it doesn't seem to allow you to let ^$ match linebreaks. You may want to consider something more powerful when working on more complex expressions.

 

Personally I user Regex Buddy, it's fantastic but you have to pay for it. I'm sure there are free alternatives, though.
No, in the Ruby regex specification, the meta character '.' (dot) doesn't match new line character by default, unlike regex in FME. If you add the option 'm' to the regex, the result would be the same as FME regex.

 

 

And the solution is:

 

 

 

Reply