Skip to main content
I wanted to split some long strings (loaded from a text file) into shorter fields less than 255 chars. I chose 100 as a reasonable line length. The SubstringExtractor transformer works well, but it requires a whole chain of transformers. I thought I could use the new integrated approach in the AttributeCreator transformer to combine them to create bound0, bound1, ...

 

 

bound0 = @Substring(@Value(bound),0,100)

 

bound1 = @Substring(@Value(bound),100,100)

 

.. etc

 

 

However TCL  tries to evaluate the resulting string as an expression, it has trouble with "and".

 

 

AttributeCreator: Failed to evaluate TCL expression: and along the

AttributeCreator: TCL Error Message: invalid bareword "and"

in expression "and along the";

should be "$and" or "{and}" or "and(...)" or ...

 

 

I don't want it evaluated.

 

I cannot see a way of escaping the substring.

 

I have tried enclosing the expression in quotes

 

\\"@Substring(...)\\"

 

but that is too late, or not effective. I had to use a backslash, not a forward slash to allow the @function to be evaluated.

 

 

Is there a more elegant way of splitting a string into short sections that fit a smaller field? Ideally I would like it split at words as well to be readable and so that numbers were not split across lines/fields. I would like it to be less hard-coded as well for max line length and number of fields.

 

 
I have tried WordWrap from the store. I cannot get it to work, and neither the demo workspace. It asks if I want to update it and then crashes or has a blank workspace.

 

 

Wordwrap may be better than splitting into fields, but it is very promising if it worked.
There is no output attribute from WordWrap, "wrapped" does not appear.
Hi,

 

 

I don't think there is the ideal solution except scripting. An example of Python script is:   import fmeobjects   def splitText(feature):     text = feature.getAttribute('text_line_data')     if not text:         return     count = 0     str = ''     for word in text.split():         if 100 < len(str) + 1 + len(word):             feature.setAttribute('bound%d' % count, str)             count = count + 1             str = ''         if len(str) == 0:             str = word         else:             str = '%s%s%s' % (str, ' ', word)     if 0 < len(str):         feature.setAttribute('bound%d' % count, str)         count = count + 1     feature.setAttribute('number_of_fields', count)

 

 

Try calling this script through a PythonCaller transformer.

 

Hope this helps.

 

 

Takashi

 


Thanks Takashi, that worked well for some fixed widths and output fields.

 

It has inspired me to use more Python in my workspaces.

 

 

IA bit more work is required to make it general enough to replace WordWrap in the store I did try to read in a published parameter to avoid hardcoding the width without success. I would then have to set the output widths as well, and dynamically add the required number of fields. All possible with enough incentive.

 

 

I did rewrite it but it does the same thing:

 

 

import fmeobjects

 

def splitText(feature):

 

    ''' split paragraph at words less than field_width chars'''

 

    text = feature.getAttribute("bound")

 

    field_width = 100 # I wish! getParameter("MaxWidth")

 

    lstWord = text.split()

 

    lstFld = t]

 

    fld = lstWord.pop(0)

 

    while lstWord:

 

        part = lstWord.pop(0)

 

        if len(fld) + len(part) <= field_width + 1:

 

            fld = fld + " " + part

 

        else : # start new field

 

            lstFld.append(fld)

 

            fld = part

 

    lstFld.append(fld) # final remainder

 

    count = len(lstFld)

 

    for n in range(count):

 

        feature.setAttribute("bound%d" % n,lstFldln])

 

    feature.setAttribute("splits",count)

 


A name and value pair of a parameter is stored in a dictionary named FME_MacroValues internally. You can get the value of a parameter through this dictionary, for example:

 

    field_width = int(FME_MacroValuesu'MaxWidth'])

 

Note parameter values are always stored as character strings, so it's necessary to convert the value type using a cast syntax like 'int(...)' if the value has to be treated as a numeric value in the script.

 

  Takashi
Hi,

 

 

no need to re-invent the wheel with Python, there is already an excellent module that does word wrapping ;-)

 

 

-----

 

import fmeobjects import textwrap   def TextWrapper(feature):     text = feature.getAttribute("text_line_data")     maxLength = int(FME_MacroValuesa'MaxWidth'])     parts = textwrap.wrap(text, maxLength)     feature.setAttribute('number_of_parts', len(parts))     if parts:         feature.setAttribute('parts{}', parts)

 

-----

 

Expose the attributes "parts{}" and "number_of_parts".

 

 

This outputs all parts to the list "parts{}", which I find much cleaner than using named attributes.

 

 

I also think this solution is much more Pythonic.

 

 

David
David, that's Pythonic! I've re-invented the wheel. Thanks for the textwrap module.

 

 

Takashi

Reply