Solved

Filter out records matching a certain format

  • 7 September 2022
  • 7 replies
  • 7 views

Badge +7

Hello community,

 

I have a street dataset with 2139 records, with each one being a line (for those in UK LAs, note that it is the LSG). It includes a field called ‘street or numbered street’.

 

For 2040 of the records the value in the ‘street or numbered street’ field is a named street (e.g.: Acacia Avenue). For 28 of the records the value in the ‘street or numbered street’ field is an A road (e.g.: A312). For 14 of the records the value in the ‘street or numbered street’ field is an B road (e.g.: B204). For 57 of the records the value in the ‘street or numbered street field is an C road (e.g.: C5840044).

 

I want to filter out the three separate road types: A roads, B roads and C roads. I want to ignore the 2040 named street records.

 

What’s the best way to do this? A filter? Query via the Text Editor within the Writer? What would the code be?

icon

Best answer by ctredinnick 7 September 2022, 11:44

View original

7 replies

Userlevel 3
Badge +16

If you want to separate out the A B and C roads, use a TestFilter, with regex to match on the street name.

^B[0-9]+ will match on a string that starts with B followed by one or more numbers.

imageIf you want test for A B and C together, then that regex would be ^[ABC][0-9]+

Badge +7

If you want to separate out the A B and C roads, use a TestFilter, with regex to match on the street name.

^B[0-9]+ will match on a string that starts with B followed by one or more numbers.

imageIf you want test for A B and C together, then that regex would be ^[ABC][0-9]+

Thanks for that. I've just run a initial quick test on the A roads and it's looking good so far. I'll need a closer look to confirm, and run the Bs and Cs.

 

By the way, is the [0-9] statement stating how many potential characters to look for/include?

Userlevel 4

If you want to separate out the A B and C roads, use a TestFilter, with regex to match on the street name.

^B[0-9]+ will match on a string that starts with B followed by one or more numbers.

imageIf you want test for A B and C together, then that regex would be ^[ABC][0-9]+

Agree. You could possibly simplify the configuration by avoiding the Regex and using "Begins with" as the operator. For larger datasets (not really the case here) it's probably also faster. But I'm nitpicking here :-)

Badge +7

If you want to separate out the A B and C roads, use a TestFilter, with regex to match on the street name.

^B[0-9]+ will match on a string that starts with B followed by one or more numbers.

imageIf you want test for A B and C together, then that regex would be ^[ABC][0-9]+

That has worked perfectly for me. Thank you.

 

And it's something I can apply to other areas to in the future.

 

Thanks again.

Badge +7

Agree. You could possibly simplify the configuration by avoiding the Regex and using "Begins with" as the operator. For larger datasets (not really the case here) it's probably also faster. But I'm nitpicking here :-)

Thanks for that , David. The Regex worked well for me. I'm very happy with the results.

 

Just wondering, would the 'begins with' suggestion also pick out those named streets that happen to begin with A (or B or C), such as Acacia Avenue? I wouldn't want it to do so.

Userlevel 4

Agree. You could possibly simplify the configuration by avoiding the Regex and using "Begins with" as the operator. For larger datasets (not really the case here) it's probably also faster. But I'm nitpicking here :-)

Indeed, you'll have to use the Regex solution if you have data like Acacia Avenue that you don't want to filter out.

Userlevel 1
Badge +10

Agree. You could possibly simplify the configuration by avoiding the Regex and using "Begins with" as the operator. For larger datasets (not really the case here) it's probably also faster. But I'm nitpicking here :-)

Yes, begins with wouldn't work for you here as it would match examples such as Acacia Avenue.

The regex given looks for any strings which start with a letter and are immediately followed by any amount of numbers (this is what the plus sign means)

If you just want all the numbered roads and don't have a requirement to filter them into A,B and C you could just use

^[A-C][0-9]+

Just be aware that there are some non standard numbered roads about, e.g. A1(M) - the above regex would still produce a match for this but there may be others to watch out for.

Reply