Question

string variation problem

  • 25 March 2021
  • 3 replies
  • 4 views

Badge +9

I process CAD files that have the same information in them but the information is formatted differently.

I need to find each iteration and process acordingly.

I am trying to split an attribute with AttributeSplitter. I have also tried the fmehub RegxAttributeSplitter. (A-[0-9]+ )

The problem I have is that in one case the string will split with \\n but in another there is no line break.

(as far as i can tell the autocad text box is set to a specific width and the text just rolls to the next line with no break or even a space.)

So if I split on lf on one dataset it works. but on another data set it just puts the entire string in with no space. are there any suggestions as to how to accomplish this?CaptureCapture2 


3 replies

Userlevel 5
Badge +29

Assuming its always in the format of

(A-\d{4}) <any number of whitespaces> (.*)

you could do something like this

(A-\d{4})\s*(.*)

This will match all the following

a-3570<space>Reeves county
 
a-3570Reeves county
 
a-3570<space><newline>
Reeves county
 
a-3570<newline>
Reeves county

You'll then have two groups, the first with the code, and the second with the string. you can then manipulate them to you liking 😊 

Badge +9

Assuming its always in the format of

(A-\d{4}) <any number of whitespaces> (.*)

you could do something like this

(A-\d{4})\s*(.*)

This will match all the following

a-3570<space>Reeves county
 
a-3570Reeves county
 
a-3570<space><newline>
Reeves county
 
a-3570<newline>
Reeves county

You'll then have two groups, the first with the code, and the second with the string. you can then manipulate them to you liking 😊 

Ah that is helpful. But since \d can be 1 or more I had been using + so is this right?

(A-\d+)\s*(.*)  Does not work.

Also since this one file was in the format \d{4} I tried it. But that doesn't do the trick. I am not sure if there is some type of lf that regex isn't seeing or if I just can't find the right one. I have tried \n \r to no avail.

Thanks @hkingsbury​ 

 

Userlevel 5
Badge +29

\\d matches a single digit (0-9), following it with a + will match one or more digits, following with a {4} will only match 4 digits.

 

Are you able to share an extract of your data?

Reply