Solved

Capture a string using @SubstringRegularExpression

  • 10 April 2024
  • 2 replies
  • 46 views

Badge +5

I am trying to return values for multiple fields (codes), parsing data from one incoming field (NAICS_CODE), using a AttributeManager transformer and SubstringRegularExpression.

The field containing the incoming data is called NAICS_CODE and can contain data for up to 30 separate codes.  For example, in one record the field NAICS_CODE has 10 separate codes.  

Using an online regex editor, I have the following code, where the index for each parsed field would be 0-9.
^\s*(?:\S+\s+){0}(\S+)   THROUGH   ^\s*(?:\S+\s+){9}(\S+)

And when I test this code against the following data with the online REGEX editor:

236210-DBE/MBE/SBE  236220-DBE/MBE/SBE  238210-DBE/MBE/SBE  238990-DBE/MBE/SBE  335311-DBE/MBE/SBE  335312-DBE/MBE/SBE  335313-DBE/MBE/SBE  335314-DBE/MBE/SBE  335999-DBE/MBE/SBE  423610-DBE/MBE/SBE

The results seem to be what I am looking for, returning the correct code from the field for each indexed value.

^\s*(?:\S+\s+){0}(\S+) returns 236210-DBE/MBE/SBE
THROUGH
^\s*(?:\S+\s+){9}(\S+) returns 423610-DBE/MBE/SBE

However, when I try to set the value of the field for the parsed code using the AttributeManager transformer, something in the expression must not be setup correctly.  I read the documentation, and I am setting the startIdx at 1, the captureNum at 0 and the matchNum at 0 for each parsed field.


All the alpha characters are uppercase.  Two blank spaces (‘  ‘) separate each code, except the first which is the start of the line. 

When I run the workspace up to the point of conversion, the resulting field is <null> not the string I am expecting.
I have only setup the first two fields, and they come up as <null>, while the other eight fields seem to be blank.

The expressions I am using are for the first two fields (0,1) are:

@SubstringRegularExpression(@Value(NAICS_CODE),^\s*(?:\S+\s+){0}(\S+),1,0,0,caseSensitive=TRUE)

@SubstringRegularExpression(@Value(NAICS_CODE),^\s*(?:\S+\s+){1}(\S+),1,0,0,caseSensitive=TRUE)

I am open to any advice. 
Thanks in advance.

icon

Best answer by robertgilley 10 April 2024, 21:05

View original

2 replies

Badge +5

I have tried FindRegularExpression, which is returning -1
and SubstringRegularExpression which returns <null>

Badge +5

From our “Making things more complicated than they need to be” department.
I was able to achieve the parsing as needed by using:

@GetWord(@Value(NAICS_CODE),0)
@GetWord(@Value(NAICS_CODE),1)
@GetWord(@Value(NAICS_CODE),2) …
@GetWord(@Value(NAICS_CODE),9)
 

Problem solved.

Reply