Question

Replacing dashes as bullets with sequential numbers


Badge +1

Hi all,

So I do have a solution which I'm including in this post, but I just don't think it's elegant enough. It seems to me that it can/should be done with less transformers. Python can probably help solve this, but I just don't have the chops, at least as of now :) And I still would like to do it all with transformers if possible.

The problem I'm trying to solve is as follows: I have string data (multi-line text) that sometimes starts with a dash character used as a bullet. I need to have these dashes replaced by sequential numbers, mostly because an import into a database system fails with dashes. Here's a before:

and after:

The StringReplacer trasformer comes very close (see the top part of ""). The nice thing about this transformer is that it allows the user to pick multiple attributes, which is a requirement for me as there are multiple attributes I need to fix for each feature. However the Replacement Text setting is limiting in my case (cannot do sequential numbers within the same attribute with it as far as I can tell; this would imply assigning lists, which isn't possible).

So my next try was to use a series of transformers to split each multi-line text into separate lines, do a RegEx search for each one to find instances of lines starting with a dash, then count those instances and use the count attribute to do the dash replacement. I also wrapped this in a custom transformer, but the problems I'm facing are:

  1. It doesn't seem possible to make changes to input parameters directly within a custom transformer, hence the need for additional attribute cleanup after the custom transformer. As you can see from the "Multiple" example, you have to add a custom transformer, followed by an AttributeManager, for each attribute you want to clean up. This is the "unelegant" part I would love to find a solution to.
  2. The custom transformer contains various other transformers that do not allow multiple attributes to be selected: AttributeSplitter & StringSearcher.
  3. I tried to use an AttributeExploder to "flatten" the attribute list, filter unwanted features (unwanted attributes become features so you filter those out), then use a single instance of the custom transformer. The problem though is: how do you re-assemble them dynamically into the original features with the original attributes after the text cleanup?

#3 can be solved (somewhat) as you can see in "", but I'm trying to create a dynamic workflow that does not require one to know all attributes that need to be fixed. Basically I want to look for this condition within all/select attributes and make the adjustment, rather than manually specifing each one and having to "hard-code" for each attribute. Any help would be greatly appreciated!


15 replies

Userlevel 1
Badge +10

The replacement text can be an attribute value, so you could use a counter to create a bulletpoint text attribute and use this in the stringreplacer

Badge +1

The replacement text can be an attribute value, so you could use a counter to create a bulletpoint text attribute and use this in the stringreplacer

Thanks, it is correct that an attribute can be used. However per my first part of the post, "Replacement Text" would need a list attribute, which is not possible as far as I can tell :) This is the reason I went with the second part of my post and built a custom transformer, which breaks down the string to separate lines, counts the bullet occurrences and replaces them, then re-assembles the lines back to the original strings.

Userlevel 1
Badge +10

Thanks, it is correct that an attribute can be used. However per my first part of the post, "Replacement Text" would need a list attribute, which is not possible as far as I can tell :) This is the reason I went with the second part of my post and built a custom transformer, which breaks down the string to separate lines, counts the bullet occurrences and replaces them, then re-assembles the lines back to the original strings.

Ah, i've read the question properly now :-)

I'd probably resort to python for this, because although i think it's probably possible in FME, i think it would be more complicated

Badge +2

Hi @dbaldacchino,

I would split the multiline into separate lines using the AttributeSplitter using (NewLine) as a delimiter before using the StringReplacer provided that you can read the text entirely in one go as one feature. Here is a screenshot of the sample workspace.

 

 

 

 

Badge +1

Ah, i've read the question properly now :-)

I'd probably resort to python for this, because although i think it's probably possible in FME, i think it would be more complicated

And that confirms my hunch :( I will have to start looking at Python eventually, even though the learning curve is going to be a tad steep since I'm not going to have enough work volume to keep my pumps primed :D

Badge +1

Hi @dbaldacchino,

I would split the multiline into separate lines using the AttributeSplitter using (NewLine) as a delimiter before using the StringReplacer provided that you can read the text entirely in one go as one feature. Here is a screenshot of the sample workspace.

 

 

 

 

Hi there, thanks for your answer :) That is similar to what I'm doing in the custom transformer (image below). The problem as identified in my lengthy post above (#2) is that the AttributeSplitter does not allow you to select multiple attributes. This is what prompted me to try the explode method (shown in the "Single" sample file) but then I don't think you can dynamically reassemble the data. I want to avoid "hard-coding" the attributes as shown in the AttributeManager within this sample file.

Badge +2

Hi there, thanks for your answer :) That is similar to what I'm doing in the custom transformer (image below). The problem as identified in my lengthy post above (#2) is that the AttributeSplitter does not allow you to select multiple attributes. This is what prompted me to try the explode method (shown in the "Single" sample file) but then I don't think you can dynamically reassemble the data. I want to avoid "hard-coding" the attributes as shown in the AttributeManager within this sample file.

After replacing the string in order to reconstruct the original text, I would create a list using the listBuilder transformer, followed by ListConcatenator, that wouls convert all the lines into a single attribute as a single text. In the ListConcatenator, you would be asked to choose the delimiter to separate the features, just use a NewLine.

 

 

 

 

Userlevel 2
Badge +17

Another approach. Split the attribute by dash, explode the resulting list, add the element index as prefix to the second or later elements, then concatenate the elements. See the attached workspace example to learn more.

dash-bullet-fixer-example-multiple-2.fmw (FME 2019.1.1)

Badge +1

Another approach. Split the attribute by dash, explode the resulting list, add the element index as prefix to the second or later elements, then concatenate the elements. See the attached workspace example to learn more.

dash-bullet-fixer-example-multiple-2.fmw (FME 2019.1.1)

Thanks a lot @takashi, this looks promising. I'll be studying your approach to the custom transformer later today and will post back :)

Userlevel 1
Badge +10

And that confirms my hunch :( I will have to start looking at Python eventually, even though the learning curve is going to be a tad steep since I'm not going to have enough work volume to keep my pumps primed :D

Something like this (although i think there's probably a better way of writing the python)

dash-bullet-fixer-example-python.fmw

 

Badge +1

Another approach. Split the attribute by dash, explode the resulting list, add the element index as prefix to the second or later elements, then concatenate the elements. See the attached workspace example to learn more.

dash-bullet-fixer-example-multiple-2.fmw (FME 2019.1.1)

Hi again @takashi, I think your solution provides an answer or two to some of my questions regarding custom transformers and their setup. I'm going to try refine my first attempt based on things I'm observing in your solution. Unfortunately I cannot use this as is because it's making the assumption that dashes always need to be replaced, which is not my requirement (only dashed at the beginning of a line are "bullets"; all others are to remain untouched). Will post again once I make more progress. Thanks.

Badge +1

Another approach. Split the attribute by dash, explode the resulting list, add the element index as prefix to the second or later elements, then concatenate the elements. See the attached workspace example to learn more.

dash-bullet-fixer-example-multiple-2.fmw (FME 2019.1.1)

Ok so here's an updated example based on what I learned from @takashi's example. I changed very little here but thanks to his workspace, I have an answer to #1 and thus a more elegant solution. The updated custom transformer (below) is much easier to use: hook up and select the attribute you want to fix and...done; add more for other attributes. I would still prefer to find a way to multi-select attributes and wrap all the logic in 1 custom transformer, but if that's not possible and I don't have to use python, I'll be a happy camper :) Thanks again!!

PS: The solution to #1 was to not check any attributes in the Input and uncheck all in Output. A PublishedParameter was added (Type: Attribute Name) and used throughout the custom transformer definition. An AttributeCreator was used to overwrite the input attribute with the fixed text, which then comes out of the Output.

Badge +1

After replacing the string in order to reconstruct the original text, I would create a list using the listBuilder transformer, followed by ListConcatenator, that wouls convert all the lines into a single attribute as a single text. In the ListConcatenator, you would be asked to choose the delimiter to separate the features, just use a NewLine.

 

 

 

 

Thanks, I am doing something very similar to re-assemble the paragraphs from the individual lines, but through an Aggregator transformer instead. The issue with having to do this separately for each attribute doesn't seem to be solvable per #2 in my original post.

Userlevel 2
Badge +17

Another approach. Split the attribute by dash, explode the resulting list, add the element index as prefix to the second or later elements, then concatenate the elements. See the attached workspace example to learn more.

dash-bullet-fixer-example-multiple-2.fmw (FME 2019.1.1)

I didn't assume that a dash could occur in the middle of a text line. I upgraded my solution.

dash-bullet-fixer-example-multiple-3.fmw (FME 2019.1.1)

Badge +1

I didn't assume that a dash could occur in the middle of a text line. I upgraded my solution.

dash-bullet-fixer-example-multiple-3.fmw (FME 2019.1.1)

Thanks again @takashi, there's some nice regex goodness in here too :) Very much appreciated!

EDIT: I never used the @Count() function before in the context of an attribute value, but this really makes things even more compact. Very nice!

Reply