Skip to main content
Question

Replacing dashes as bullets with sequential numbers


Forum|alt.badge.img+6

Hi all,

So I do have a solution which I'm including in this post, but I just don't think it's elegant enough. It seems to me that it can/should be done with less transformers. Python can probably help solve this, but I just don't have the chops, at least as of now :) And I still would like to do it all with transformers if possible.

The problem I'm trying to solve is as follows: I have string data (multi-line text) that sometimes starts with a dash character used as a bullet. I need to have these dashes replaced by sequential numbers, mostly because an import into a database system fails with dashes. Here's a before:

and after:

The StringReplacer trasformer comes very close (see the top part of ""). The nice thing about this transformer is that it allows the user to pick multiple attributes, which is a requirement for me as there are multiple attributes I need to fix for each feature. However the Replacement Text setting is limiting in my case (cannot do sequential numbers within the same attribute with it as far as I can tell; this would imply assigning lists, which isn't possible).

So my next try was to use a series of transformers to split each multi-line text into separate lines, do a RegEx search for each one to find instances of lines starting with a dash, then count those instances and use the count attribute to do the dash replacement. I also wrapped this in a custom transformer, but the problems I'm facing are:

  1. It doesn't seem possible to make changes to input parameters directly within a custom transformer, hence the need for additional attribute cleanup after the custom transformer. As you can see from the "Multiple" example, you have to add a custom transformer, followed by an AttributeManager, for each attribute you want to clean up. This is the "unelegant" part I would love to find a solution to.
  2. The custom transformer contains various other transformers that do not allow multiple attributes to be selected: AttributeSplitter & StringSearcher.
  3. I tried to use an AttributeExploder to "flatten" the attribute list, filter unwanted features (unwanted attributes become features so you filter those out), then use a single instance of the custom transformer. The problem though is: how do you re-assemble them dynamically into the original features with the original attributes after the text cleanup?

#3 can be solved (somewhat) as you can see in "", but I'm trying to create a dynamic workflow that does not require one to know all attributes that need to be fixed. Basically I want to look for this condition within all/select attributes and make the adjustment, rather than manually specifing each one and having to "hard-code" for each attribute. Any help would be greatly appreciated!

15 replies

ebygomm
Influencer
Forum|alt.badge.img+39
  • Influencer
  • August 13, 2019

The replacement text can be an attribute value, so you could use a counter to create a bulletpoint text attribute and use this in the stringreplacer


Forum|alt.badge.img+6
ebygomm wrote:

The replacement text can be an attribute value, so you could use a counter to create a bulletpoint text attribute and use this in the stringreplacer

Thanks, it is correct that an attribute can be used. However per my first part of the post, "Replacement Text" would need a list attribute, which is not possible as far as I can tell :) This is the reason I went with the second part of my post and built a custom transformer, which breaks down the string to separate lines, counts the bullet occurrences and replaces them, then re-assembles the lines back to the original strings.


ebygomm
Influencer
Forum|alt.badge.img+39
  • Influencer
  • August 13, 2019
dbaldacchino wrote:

Thanks, it is correct that an attribute can be used. However per my first part of the post, "Replacement Text" would need a list attribute, which is not possible as far as I can tell :) This is the reason I went with the second part of my post and built a custom transformer, which breaks down the string to separate lines, counts the bullet occurrences and replaces them, then re-assembles the lines back to the original strings.

Ah, i've read the question properly now :-)

I'd probably resort to python for this, because although i think it's probably possible in FME, i think it would be more complicated


mygis
Supporter
Forum|alt.badge.img+13
  • Supporter
  • August 13, 2019

Hi @dbaldacchino,

I would split the multiline into separate lines using the AttributeSplitter using (NewLine) as a delimiter before using the StringReplacer provided that you can read the text entirely in one go as one feature. Here is a screenshot of the sample workspace.

 

 

 

 


Forum|alt.badge.img+6
ebygomm wrote:

Ah, i've read the question properly now :-)

I'd probably resort to python for this, because although i think it's probably possible in FME, i think it would be more complicated

And that confirms my hunch :( I will have to start looking at Python eventually, even though the learning curve is going to be a tad steep since I'm not going to have enough work volume to keep my pumps primed :D


Forum|alt.badge.img+6
mygis wrote:

Hi @dbaldacchino,

I would split the multiline into separate lines using the AttributeSplitter using (NewLine) as a delimiter before using the StringReplacer provided that you can read the text entirely in one go as one feature. Here is a screenshot of the sample workspace.

 

 

 

 

Hi there, thanks for your answer :) That is similar to what I'm doing in the custom transformer (image below). The problem as identified in my lengthy post above (#2) is that the AttributeSplitter does not allow you to select multiple attributes. This is what prompted me to try the explode method (shown in the "Single" sample file) but then I don't think you can dynamically reassemble the data. I want to avoid "hard-coding" the attributes as shown in the AttributeManager within this sample file.


mygis
Supporter
Forum|alt.badge.img+13
  • Supporter
  • August 13, 2019
dbaldacchino wrote:

Hi there, thanks for your answer :) That is similar to what I'm doing in the custom transformer (image below). The problem as identified in my lengthy post above (#2) is that the AttributeSplitter does not allow you to select multiple attributes. This is what prompted me to try the explode method (shown in the "Single" sample file) but then I don't think you can dynamically reassemble the data. I want to avoid "hard-coding" the attributes as shown in the AttributeManager within this sample file.

After replacing the string in order to reconstruct the original text, I would create a list using the listBuilder transformer, followed by ListConcatenator, that wouls convert all the lines into a single attribute as a single text. In the ListConcatenator, you would be asked to choose the delimiter to separate the features, just use a NewLine.

 

 

 

 


takashi
Influencer
  • August 13, 2019

Another approach. Split the attribute by dash, explode the resulting list, add the element index as prefix to the second or later elements, then concatenate the elements. See the attached workspace example to learn more.

dash-bullet-fixer-example-multiple-2.fmw (FME 2019.1.1)


Forum|alt.badge.img+6
takashi wrote:

Another approach. Split the attribute by dash, explode the resulting list, add the element index as prefix to the second or later elements, then concatenate the elements. See the attached workspace example to learn more.

dash-bullet-fixer-example-multiple-2.fmw (FME 2019.1.1)

Thanks a lot @takashi, this looks promising. I'll be studying your approach to the custom transformer later today and will post back :)


ebygomm
Influencer
Forum|alt.badge.img+39
  • Influencer
  • August 13, 2019
dbaldacchino wrote:

And that confirms my hunch :( I will have to start looking at Python eventually, even though the learning curve is going to be a tad steep since I'm not going to have enough work volume to keep my pumps primed :D

Something like this (although i think there's probably a better way of writing the python)

dash-bullet-fixer-example-python.fmw

 


Forum|alt.badge.img+6
takashi wrote:

Another approach. Split the attribute by dash, explode the resulting list, add the element index as prefix to the second or later elements, then concatenate the elements. See the attached workspace example to learn more.

dash-bullet-fixer-example-multiple-2.fmw (FME 2019.1.1)

Hi again @takashi, I think your solution provides an answer or two to some of my questions regarding custom transformers and their setup. I'm going to try refine my first attempt based on things I'm observing in your solution. Unfortunately I cannot use this as is because it's making the assumption that dashes always need to be replaced, which is not my requirement (only dashed at the beginning of a line are "bullets"; all others are to remain untouched). Will post again once I make more progress. Thanks.


Forum|alt.badge.img+6
takashi wrote:

Another approach. Split the attribute by dash, explode the resulting list, add the element index as prefix to the second or later elements, then concatenate the elements. See the attached workspace example to learn more.

dash-bullet-fixer-example-multiple-2.fmw (FME 2019.1.1)

Ok so here's an updated example based on what I learned from @takashi's example. I changed very little here but thanks to his workspace, I have an answer to #1 and thus a more elegant solution. The updated custom transformer (below) is much easier to use: hook up and select the attribute you want to fix and...done; add more for other attributes. I would still prefer to find a way to multi-select attributes and wrap all the logic in 1 custom transformer, but if that's not possible and I don't have to use python, I'll be a happy camper :) Thanks again!!

PS: The solution to #1 was to not check any attributes in the Input and uncheck all in Output. A PublishedParameter was added (Type: Attribute Name) and used throughout the custom transformer definition. An AttributeCreator was used to overwrite the input attribute with the fixed text, which then comes out of the Output.


Forum|alt.badge.img+6
mygis wrote:

After replacing the string in order to reconstruct the original text, I would create a list using the listBuilder transformer, followed by ListConcatenator, that wouls convert all the lines into a single attribute as a single text. In the ListConcatenator, you would be asked to choose the delimiter to separate the features, just use a NewLine.

 

 

 

 

Thanks, I am doing something very similar to re-assemble the paragraphs from the individual lines, but through an Aggregator transformer instead. The issue with having to do this separately for each attribute doesn't seem to be solvable per #2 in my original post.


takashi
Influencer
  • August 14, 2019
takashi wrote:

Another approach. Split the attribute by dash, explode the resulting list, add the element index as prefix to the second or later elements, then concatenate the elements. See the attached workspace example to learn more.

dash-bullet-fixer-example-multiple-2.fmw (FME 2019.1.1)

I didn't assume that a dash could occur in the middle of a text line. I upgraded my solution.

dash-bullet-fixer-example-multiple-3.fmw (FME 2019.1.1)


Forum|alt.badge.img+6
takashi wrote:

I didn't assume that a dash could occur in the middle of a text line. I upgraded my solution.

dash-bullet-fixer-example-multiple-3.fmw (FME 2019.1.1)

Thanks again @takashi, there's some nice regex goodness in here too :) Very much appreciated!

EDIT: I never used the @Count() function before in the context of an attribute value, but this really makes things even more compact. Very nice!


Cookie policy

We use cookies to enhance and personalize your experience. If you accept you agree to our full cookie policy. Learn more about our cookies.

 
Cookie settings