Solved

One attribute to multiple using regEx

  • 16 October 2017
  • 7 replies
  • 29 views

Hello,

I am having difficulty finding something similar to this in the Knowledge Center so I thought I would ask.

I have an attribute whose value looks like this (it is a full line from a badly formatted text file).

Ticket No: 4653 Nearest Intersection: Chinton Street Seq No: 34

I need them split to an attribute list and subsequent value list

Eg:

Attribute_1 = Ticket No

Value_1 = 4653

Attribute_2: Nearest Intersection

Value_2: Chinton Street

Attribute_3: Seq No

Value_3: 34

It doesn't have to be a "list" as long as I can separate those part of the texts. I am guessing there should be a RegEx way of doing this.

The attribute names "Ticket No", "Nearest Intersection" and "Seq No" will always remain the same with their values changing. I am trying to build a script to always separate them.

Any suggestions? Thank you!

Addition:

For the first one, if there was a way to extract between 'Ticket No:' and 'Nearest' that would be fine. I can work with the formatting of the value afterwards.

I have 19 lines with the same formatting problems. With 2-3 attributes per line. I am trying to avoid too many transformers per line. If I could possibly use an attribute creator where I create the new attribute and the value would be a the substring between two known attributes on both sides, that would be great.

icon

Best answer by fariyafarhad 16 October 2017, 21:21

View original

7 replies

Userlevel 3
Badge +26

Here is my first thought. On the StringConcatenator, I put in 4 elements from the list to account for 4 separate 'words' in the Intersection, i.e. Huntsville Browns Ferry Road. You can add more list elements there if you think you would have that scenario.

Here is my first thought. On the StringConcatenator, I put in 4 elements from the list to account for 4 separate 'words' in the Intersection, i.e. Huntsville Browns Ferry Road. You can add more list elements there if you think you would have that scenario.

Hello @cartoscro

 

I appreciate your answer. That seems like a good way to do it. What I have though are many different lines with the same formatting issue. Using so many transformers per line may be a bit too much. I am trying to see if there is an easier Regex way of doing it.
Userlevel 3
Badge +26
Hello @cartoscro

 

I appreciate your answer. That seems like a good way to do it. What I have though are many different lines with the same formatting issue. Using so many transformers per line may be a bit too much. I am trying to see if there is an easier Regex way of doing it.
@fariyafarhad There likely is a cleaner way to do it with Regex. FME is very efficient within it's text string processing, so I have always defaulted to the line of thinking with multiple transformers.

 

Userlevel 3
Badge +26
Hello @cartoscro

 

I appreciate your answer. That seems like a good way to do it. What I have though are many different lines with the same formatting issue. Using so many transformers per line may be a bit too much. I am trying to see if there is an easier Regex way of doing it.
Slightly smaller workflow:

 

Hello @cartoscro

 

I appreciate your answer. That seems like a good way to do it. What I have though are many different lines with the same formatting issue. Using so many transformers per line may be a bit too much. I am trying to see if there is an easier Regex way of doing it.
@cartoscroI like the shorter workflow. Though I think I stumbled upon an answer myself. Thanks!

 

I decided to use a StringSearcher.

So for that one line, I set up 3 string searchers in series. If the string is matched, this transformer automatically puts the matching text in an attribute and I specified the attribute name. The first StringSearcher is set up this way.

This means one transformer per

attribute. If any one has a better suggestion, still welcome :)

Userlevel 3
Badge +17

Hi @fariyafarhad, why not use the StringSearcher with this regex?

^Ticket No\s*:\s*(.*)\s+Nearest Intersection\s*:\s*(.*)\s+Seq No\s*:\s*(.*)$
And set the Subexpression Matches List Name (e.g. _sub). You can then rename "_sub{0}.part" to "Ticket No", and so on.

Reply