I've written a regular expression for StringSearcher to separate features with road names (e.g. "Acacia Avenue") from features with road numbers (e.g. "A246", "B3276", "A3(M)"). Road numbers start with a single letter followed by a 1-4+ digit number and sometimes suffixed with further characters such as "(M)".However my original RegEx mistakenly treats values like "Service Road 2" and "14th Avenue" as road numbers when they should be treated as road names. I've updated the RegEx to this: ^[a-zA-Z]{1}[\\d]{1,}.{0,}$ but it's still not filtering correctly. If I put square brackets round the dot (any single character), it correctly gets "A246" but misses "A3(M)".

RegEx help - separate road numbers from road names

Userlevel 4

How about

^\w\d{1,3}[\(\w*\)]?$

+7

tim_wood
Author
Contributor
311 replies
5 years ago
17 October 2018

How about

^\w\d{1,3}[\(\w*\)]?$

Brilliant. Thank you so much :-)

+7

tim_wood
Author
Contributor
311 replies
5 years ago
17 October 2018

How about

^\w\d{1,3}[\(\w*\)]?$

Ah. It's not correctly filtering"A3" as a road number. A single letter followed by 2 or more digits is fine though.

Update: done a bit more trial and error and this seems to work: ^\w[$\w*$]+$

Presumably the numbers count as part of the word at the start if there's no space after the letter...?

I think the answer may be to look for values without a space in but I'll have to check the data again.

+7

tim_wood
Author
Contributor
311 replies
5 years ago
17 October 2018

How about

^\w\d{1,3}[\(\w*\)]?$

D'oh. There are single word road names - no spaces...

Userlevel 4

Ah. It's not correctly filtering"A3" as a road number. A single letter followed by 2 or more digits is fine though.

Update: done a bit more trial and error and this seems to work: ^\\w[\$\\w*\$]+$

Presumably the numbers count as part of the word at the start if there's no space after the letter...?

I think the answer may be to look for values without a space in but I'll have to check the data again.

Ah, yes I used the wrong quantifier for the paranthesis. Replace the last "+" with a "?" and it should work. I've updated my answer accordingly.

Userlevel 4

D'oh. There are single word road names - no spaces...

Should work as well now. If not, can you give an example that doesn't match?

+7

tim_wood
Author
Contributor
311 replies
5 years ago
17 October 2018

Should work as well now. If not, can you give an example that doesn't match?

That matches the first 2 in the "Do match" list below but not the rest. If I change {1,3} to {1,6} or {1,} I get everything except the ones with "(M)" in. However, I think this works: ^\\w\\d{1,}[\$\\w*\$]{0,}$

I don't think the "B67856x" example should exist but if it does, it will be filtered correctly. There's still the possibility that the data will throw up other examples that RegEx doesn't process correctly but that's data for you...

Do match: A3

A246

C765

D98765

A3(M)

A23524657(M)

B67856x

Don't match:

Piccadily

14th Avenue

Service Road 2

Acacia Avenue

Userlevel 1

+21

ebygomm
Contributor
3079 replies
5 years ago
17 October 2018
Best Answer

I'd use the following to match only classified UK road numbers

^[A-Z][0-9]{1,5}.*

Userlevel 4

This one should work according to your specification:

^\w\d+((\(\w*\))|\w)?$

Matches:

A3

A246

C765

D98765

A3(M)

A23524657(M)

B67856x

Doesn't match:

Piccadily

14th Avenue

Service Road 2

Acacia Avenue

Userlevel 4

I'd use the following to match only classified UK road numbers

^[A-Z][0-9]{1,5}.*

Very nice. Matches exactly the same as my much more complex regex. Goes to show that local knowledge goes a long way!

+7

tim_wood
Author
Contributor
311 replies
5 years ago
17 October 2018

I'd use the following to match only classified UK road numbers

^[A-Z][0-9]{1,5}.*

OK. I think that works but I'm getting slightly confused by the Test String box in StringSearcher. Even if I put an end of line $ in my RegEx, it still matches Acacia Avenue in a list which includes values I do want to match. But Acacia Avenue its own doesn't match which is correct. It's a bit annoying as I though my carriage returns/line feeds in the Test String box would match the end of line $ but they don't seem to. That means I can't use a list for testing and have to delete and type all my test values separately.

Userlevel 4

OK. I think that works but I'm getting slightly confused by the Test String box in StringSearcher. Even if I put an end of line $ in my RegEx, it still matches Acacia Avenue in a list which includes values I do want to match. But Acacia Avenue its own doesn't match which is correct. It's a bit annoying as I though my carriage returns/line feeds in the Test String box would match the end of line $ but they don't seem to. That means I can't use a list for testing and have to delete and type all my test values separately.

I'm guessing that's a limitation of the test string box in FME, it doesn't seem to allow you to let ^$ match linebreaks. You may want to consider something more powerful when working on more complex expressions.

Personally I user Regex Buddy, it's fantastic but you have to pay for it. I'm sure there are free alternatives, though.

Userlevel 2

+17

takashi
Contributor
7538 replies
5 years ago
18 October 2018

I'd use the following to match only classified UK road numbers

^[A-Z][0-9]{1,5}.*

Try this one.

^[A-Z][0-9]{1,5}.*?$

Userlevel 1

+21

ebygomm
Contributor
3079 replies
5 years ago
18 October 2018

I'm guessing that's a limitation of the test string box in FME, it doesn't seem to allow you to let ^$ match linebreaks. You may want to consider something more powerful when working on more complex expressions.

Personally I user Regex Buddy, it's fantastic but you have to pay for it. I'm sure there are free alternatives, though.

Yes, it's a limitation of the test string box. I tend to use rubular for checking these sorts of things

Edit: limitation isn't probably quite the right word, the FME test string box is working correctly if you were sending it a block of text spread over multiple lines in a single attribute. But in this scenario where your list is actually representing multiple feature types with each line being the value of an attribute in different features it can be confusing.

Userlevel 2

+17

takashi
Contributor
7538 replies
5 years ago
18 October 2018

I'm guessing that's a limitation of the test string box in FME, it doesn't seem to allow you to let ^$ match linebreaks. You may want to consider something more powerful when working on more complex expressions.

Personally I user Regex Buddy, it's fantastic but you have to pay for it. I'm sure there are free alternatives, though.

No, in the Ruby regex specification, the meta character '.' (dot) doesn't match new line character by default, unlike regex in FME. If you add the option 'm' to the regex, the result would be the same as FME regex.

And the solution is:

RegEx help - separate road numbers from road names

15 replies

Reply

Community Stats

Reply

Community Stats

Sign up

An FME Account is required to contribute

Login to the community

An FME Account is required to contribute

Scanning file for viruses.

This file cannot be downloaded