Skip to main content
Solved

AttributeCreator and long list of attributes to expose


Forum|alt.badge.img

I use AttributeCreator to prepare a list of columns (A, B, C, D, etc..) for an Excel writer. My problem is that the list is 4000 items long. I have to manually expose every one of the column labels, which is not feasible. I have looked up and down and did not find a solution. Is there a way with PythonCaller to label a feature as exposed?

Best answer by takashi

In this case, replacing more transformers with Python codes could increase the performance drastically. For example:

0684Q00000ArJMvQAN.png

# PythonCaller Script Example
import fmeobjects

# Get Excel Column Name (A,B,C,...) corresponding to a column index.
# index: 0-based column index
def xlsxColumnName(index):
    alpha = ''
    while 0 <= index:
        alpha = chr(index % 26 + 65) + alpha
        index = index // 26 - 1
    return alpha

class FeatureProcessor(object):
    def __init__(self):
        self.column_index = 0
        self.is_target = lambda attr: \
            not '(1900/01/00)' in attr \
            and not attr.startswith('fme_') \
            and not attr.startswith('csv_')
            
    def input(self, feature):
        # Collect target attribute names.
        attribs = [a for a in feature.getAllAttributeNames() if self.is_target(a)]
        
        # First column (column index = 0) values are attribute names.
        # Assume the first incoming feature has all target attributes.
        if self.column_index == 0:
            # Create and output a schema feature at first.
            numColumns = int(FME_MacroValues['NUM'])
            schema = fmeobjects.FMEFeature()
            schema.setAttribute('fme_feature_type_name', \
                FME_MacroValues['DestExcelSheetName'])
            schema.setAttribute('fme_schema_handling''schema_only')
            for i in range(numColumns):
                schema.setAttribute('attribute{%d}.name' % i, xlsxColumnName(i))
                schema.setAttribute('attribute{%d}.fme_data_type' % i, 'fme_real64')
            self.pyoutput(schema)
            
            item = fmeobjects.FMEFeature()
            for attr in attribs:
                item.setAttribute('_attr_name', attr)
                item.setAttribute('A', attr)
                self.pyoutput(item)
            self.column_index = 1
        
        # Explode the incoming feature with the target attributes.
        item = fmeobjects.FMEFeature()
        columnName = xlsxColumnName(self.column_index)
        for attr in attribs:
            item.setAttribute('_attr_name', attr)
            item.setAttribute(columnName, feature.getAttribute(attr))
            self.pyoutput(item)
        self.column_index += 1

There still is a room to improve the performance.

To be continued (maybe).

View original
Did this help you find an answer to your question?

10 replies

Forum|alt.badge.img
  • Author
  • August 30, 2017
And just to expand a bit on my question... With PythonCaller, I can add the Excel column name attribute to the individual features, essentially duplicating what AttributeCreator does. But this new attribute does not get exposed, thus my question.

 

 

 


danilo_fme
Evangelist
Forum|alt.badge.img+41
  • Evangelist
  • August 30, 2017

Hi @fjcharp,

Could you share us your Workspace template ( .FMWT )?

Thanks,

Danilo


Forum|alt.badge.img
  • Author
  • August 30, 2017
danilo_fme wrote:

Hi @fjcharp,

Could you share us your Workspace template ( .FMWT )?

Thanks,

Danilo

 

I certainly can. moe.zip

 

The model reads in a table that I need to pivot and write to Excel. The steps are self-explanatory. The reader included here reads a small subset of the complete set (for testing purposes), but the subset is still large enough to require more columns than the ones I manually exposed.

 

You can ignore the PythonCaller component: it is work in progress that simply allowed me to examine the features output by AttributeCreator.

 

So the problem is: manually exposing an excessive number of attributes in AttributeCreator is painful and not practical. So my thought was: why don't I create the Excel column name attribute in the lists features (emulating what AttributeCreator does) and then expose these newly created attributes? That's where I hit the proverbial wall :-)

 

Frederic

takashi
Contributor
Forum|alt.badge.img+17
  • Contributor
  • August 31, 2017

Hi @fjcharp, I think it would be easier to rename the attribute to destination Excel column name before aggregating the item features for each original CSV field name, rather than once creating a list then renaming its elements.

FME can rename the attribute to destination Excel column name and also configure destination feature type schema automatically, if you implement your workspace as a dynamic workflow. I think the method described in this article would be suitable to this case.

Dynamic Workflows: Destination Schema is Derived from a Schema Feature

Additionally, the NumToAlphaConverter custom transformer from FME Hub can be used effectively to generate a series of Excel column names based on a sequence of numbers.See the attached workspace example to learn more: moe-2.zip (FME 2017.0.1.1)Hope this helps.

danilo_fme
Evangelist
Forum|alt.badge.img+41
  • Evangelist
  • August 31, 2017
takashi wrote:

Hi @fjcharp, I think it would be easier to rename the attribute to destination Excel column name before aggregating the item features for each original CSV field name, rather than once creating a list then renaming its elements.

FME can rename the attribute to destination Excel column name and also configure destination feature type schema automatically, if you implement your workspace as a dynamic workflow. I think the method described in this article would be suitable to this case.

Dynamic Workflows: Destination Schema is Derived from a Schema Feature

Additionally, the NumToAlphaConverter custom transformer from FME Hub can be used effectively to generate a series of Excel column names based on a sequence of numbers.See the attached workspace example to learn more: moe-2.zip (FME 2017.0.1.1)Hope this helps.
Hi @fjcharp, the @takashi did give you a good solution your case using Dynamic Workflow.

 

Congratulations @takashi

Forum|alt.badge.img
  • Author
  • August 31, 2017

Takashi,

Wow... Thank you so much for taking the time. The modified template is quite the tutorial. I have learned a lot from it.

Frederic


takashi
Contributor
Forum|alt.badge.img+17
  • Contributor
  • Best Answer
  • September 1, 2017

In this case, replacing more transformers with Python codes could increase the performance drastically. For example:

0684Q00000ArJMvQAN.png

# PythonCaller Script Example
import fmeobjects

# Get Excel Column Name (A,B,C,...) corresponding to a column index.
# index: 0-based column index
def xlsxColumnName(index):
    alpha = ''
    while 0 <= index:
        alpha = chr(index % 26 + 65) + alpha
        index = index // 26 - 1
    return alpha

class FeatureProcessor(object):
    def __init__(self):
        self.column_index = 0
        self.is_target = lambda attr: \
            not '(1900/01/00)' in attr \
            and not attr.startswith('fme_') \
            and not attr.startswith('csv_')
            
    def input(self, feature):
        # Collect target attribute names.
        attribs = [a for a in feature.getAllAttributeNames() if self.is_target(a)]
        
        # First column (column index = 0) values are attribute names.
        # Assume the first incoming feature has all target attributes.
        if self.column_index == 0:
            # Create and output a schema feature at first.
            numColumns = int(FME_MacroValues['NUM'])
            schema = fmeobjects.FMEFeature()
            schema.setAttribute('fme_feature_type_name', \
                FME_MacroValues['DestExcelSheetName'])
            schema.setAttribute('fme_schema_handling''schema_only')
            for i in range(numColumns):
                schema.setAttribute('attribute{%d}.name' % i, xlsxColumnName(i))
                schema.setAttribute('attribute{%d}.fme_data_type' % i, 'fme_real64')
            self.pyoutput(schema)
            
            item = fmeobjects.FMEFeature()
            for attr in attribs:
                item.setAttribute('_attr_name', attr)
                item.setAttribute('A', attr)
                self.pyoutput(item)
            self.column_index = 1
        
        # Explode the incoming feature with the target attributes.
        item = fmeobjects.FMEFeature()
        columnName = xlsxColumnName(self.column_index)
        for attr in attribs:
            item.setAttribute('_attr_name', attr)
            item.setAttribute(columnName, feature.getAttribute(attr))
            self.pyoutput(item)
        self.column_index += 1

There still is a room to improve the performance.

To be continued (maybe).


Forum|alt.badge.img
  • Author
  • September 1, 2017
danilo_fme wrote:

Hi @fjcharp,

Could you share us your Workspace template ( .FMWT )?

Thanks,

Danilo

???????????

 

Frederic

 

 


Forum|alt.badge.img
  • Author
  • September 1, 2017
takashi wrote:

In this case, replacing more transformers with Python codes could increase the performance drastically. For example:

0684Q00000ArJMvQAN.png

# PythonCaller Script Example
import fmeobjects

# Get Excel Column Name (A,B,C,...) corresponding to a column index.
# index: 0-based column index
def xlsxColumnName(index):
    alpha = ''
    while 0 <= index:
        alpha = chr(index % 26 + 65) + alpha
        index = index // 26 - 1
    return alpha

class FeatureProcessor(object):
    def __init__(self):
        self.column_index = 0
        self.is_target = lambda attr: \
            not '(1900/01/00)' in attr \
            and not attr.startswith('fme_') \
            and not attr.startswith('csv_')
            
    def input(self, feature):
        # Collect target attribute names.
        attribs = [a for a in feature.getAllAttributeNames() if self.is_target(a)]
        
        # First column (column index = 0) values are attribute names.
        # Assume the first incoming feature has all target attributes.
        if self.column_index == 0:
            # Create and output a schema feature at first.
            numColumns = int(FME_MacroValues['NUM'])
            schema = fmeobjects.FMEFeature()
            schema.setAttribute('fme_feature_type_name', \
                FME_MacroValues['DestExcelSheetName'])
            schema.setAttribute('fme_schema_handling''schema_only')
            for i in range(numColumns):
                schema.setAttribute('attribute{%d}.name' % i, xlsxColumnName(i))
                schema.setAttribute('attribute{%d}.fme_data_type' % i, 'fme_real64')
            self.pyoutput(schema)
            
            item = fmeobjects.FMEFeature()
            for attr in attribs:
                item.setAttribute('_attr_name', attr)
                item.setAttribute('A', attr)
                self.pyoutput(item)
            self.column_index = 1
        
        # Explode the incoming feature with the target attributes.
        item = fmeobjects.FMEFeature()
        columnName = xlsxColumnName(self.column_index)
        for attr in attribs:
            item.setAttribute('_attr_name', attr)
            item.setAttribute(columnName, feature.getAttribute(attr))
            self.pyoutput(item)
        self.column_index += 1

There still is a room to improve the performance.

To be continued (maybe).

Takashi,

 

 

Hmmm... This is a good seed for a more general-purpose pivot function. I had done something similar, but not being sufficiently experienced with schemas, I took the easy way out and wrote the output directly to a CSV file. I had also let the Aggregator create a list of columns items before invoking my Python Caller. The code is very rough... I was under the gun to get something out, so I slapped something together as fast as I could. I am going to spend some time now to clean this up and integrate your approach.

 

 

???????????,

 

Frederic 

 

 


takashi
Contributor
Forum|alt.badge.img+17
  • Contributor
  • September 2, 2017

???????? ^^

Further, the CSV reader and all transformers in the previous example can be replaced with a single PythonCreator, and it has increased the performance much.

0684Q00000ArJo8QAF.png

# PythonCreator Script Example
import fmeobjects, csv
class FeatureCreator(object):
    def close(self):
        # Return True if the CSV field is NOT a target to be processed.
        notTarget = lambda field: \
            '(1900/01/00)' in field
            
        # Return Excel Column Name (A,B,C,...) corresponding to a column index.
        # index: 0-based column index
        def xlsxColumnName(index):
            name = ''
            while 0 <= index:
                name = chr(index % 26 + 65) + name
                index = index // 26 - 1
            return name

        # Create and output a schema feature.
        schema = fmeobjects.FMEFeature()
        attrToValue = {
            'fme_feature_type_name' : FME_MacroValues['DestExcelSheetName'],
            'fme_schema_handling' : 'schema_only',
        }
        for attr, value in attrToValue.items():
            schema.setAttribute(attr, value)
        for i in range(int(FME_MacroValues['NUM'])):
            schema.setAttribute('attribute{%d}.name' % i, xlsxColumnName(i))
            schema.setAttribute('attribute{%d}.fme_data_type' % i, 'fme_real64')
        self.pyoutput(schema)
        
        # Get cell values from source CSV table and save them into a dictionary.
        # key   = 0-based index of source field (destination row)
        # value = dictionary {destination column name : cell value}
        data = {}
        with open(FME_MacroValues['SourceDataset_CSV2'], 'r'as csvfile:
            nontargets = None
            for i, row in enumerate(csv.reader(csvfile)):
                # Collect non-target keys from the first row (i.e. field names).
                if i == 0:
                    nontargets = {k for k, v in enumerate(row) if notTarget(v)}
                c = xlsxColumnName(i)
                for k, v in enumerate(row):
                    if k not in nontargets:
                        data.setdefault(k, {})[c] = v
                    
        # Create and output data features for each row in destination Excel table.
        for k in sorted(data.keys()):
            feature = fmeobjects.FMEFeature()
            for c, v in data[k].items():
                feature.setAttribute(c, v)
            self.pyoutput(feature)

Dynamic Schema Configuration in the Excel Writer Feature Type

If you don't need to write the field names (A,B,C...) into the first row,

  • set 'No' to the Output Field Names parameter,
  • and
    set 'Yes' to the Use Attribute Names As Column Positions parameter.

0684Q00000ArJl5QAF.png


Reply


Cookie policy

We use cookies to enhance and personalize your experience. If you accept you agree to our full cookie policy. Learn more about our cookies.

 
Cookie settings