Skip to main content
Question

string variation problem


gisbradokla
Enthusiast
Forum|alt.badge.img+16

I process CAD files that have the same information in them but the information is formatted differently.

I need to find each iteration and process acordingly.

I am trying to split an attribute with AttributeSplitter. I have also tried the fmehub RegxAttributeSplitter. (A-[0-9]+ )

The problem I have is that in one case the string will split with \\n but in another there is no line break.

(as far as i can tell the autocad text box is set to a specific width and the text just rolls to the next line with no break or even a space.)

So if I split on lf on one dataset it works. but on another data set it just puts the entire string in with no space. are there any suggestions as to how to accomplish this?CaptureCapture2 

3 replies

hkingsbury
Celebrity
Forum|alt.badge.img+50
  • Celebrity
  • March 25, 2021

Assuming its always in the format of

(A-\d{4}) <any number of whitespaces> (.*)

you could do something like this

(A-\d{4})\s*(.*)

This will match all the following

a-3570<space>Reeves county
 
a-3570Reeves county
 
a-3570<space><newline>
Reeves county
 
a-3570<newline>
Reeves county

You'll then have two groups, the first with the code, and the second with the string. you can then manipulate them to you liking 😊 


gisbradokla
Enthusiast
Forum|alt.badge.img+16
  • Author
  • Enthusiast
  • March 26, 2021
hkingsbury wrote:

Assuming its always in the format of

(A-\d{4}) <any number of whitespaces> (.*)

you could do something like this

(A-\d{4})\s*(.*)

This will match all the following

a-3570<space>Reeves county
 
a-3570Reeves county
 
a-3570<space><newline>
Reeves county
 
a-3570<newline>
Reeves county

You'll then have two groups, the first with the code, and the second with the string. you can then manipulate them to you liking 😊 

Ah that is helpful. But since \d can be 1 or more I had been using + so is this right?

(A-\d+)\s*(.*)  Does not work.

Also since this one file was in the format \d{4} I tried it. But that doesn't do the trick. I am not sure if there is some type of lf that regex isn't seeing or if I just can't find the right one. I have tried \n \r to no avail.

Thanks @hkingsbury​ 

 


hkingsbury
Celebrity
Forum|alt.badge.img+50
  • Celebrity
  • March 28, 2021

\\d matches a single digit (0-9), following it with a + will match one or more digits, following with a {4} will only match 4 digits.

 

Are you able to share an extract of your data?


Cookie policy

We use cookies to enhance and personalize your experience. If you accept you agree to our full cookie policy. Learn more about our cookies.

 
Cookie settings