Best Answer

AttributeCreator and long list of attributes to expose

Forum|Forum|8 years ago
August 30, 2017
10 replies
97 views

fjcharp

I use AttributeCreator to prepare a list of columns (A, B, C, D, etc..) for an Excel writer. My problem is that the list is 4000 items long. I have to manually expose every one of the column labels, which is not feasible. I have looked up and down and did not find a solution. Is there a way with PythonCaller to label a feature as exposed?

Best answer by takashi

In this case, replacing more transformers with Python codes could increase the performance drastically. For example:

# PythonCaller Script Example
import fmeobjects

# Get Excel Column Name (A,B,C,...) corresponding to a column index.
# index: 0-based column index
def xlsxColumnName(index):
    alpha = ''
    while 0 <= index:
        alpha = chr(index % 26 + 65) + alpha
        index = index // 26 - 1
    return alpha

class FeatureProcessor(object):
    def __init__(self):
        self.column_index = 0
        self.is_target = lambda attr: \
            not '(1900/01/00)' in attr \
            and not attr.startswith('fme_') \
            and not attr.startswith('csv_')
            
    def input(self, feature):
        # Collect target attribute names.
        attribs = [a for a in feature.getAllAttributeNames() if self.is_target(a)]
        
        # First column (column index = 0) values are attribute names.
        # Assume the first incoming feature has all target attributes.
        if self.column_index == 0:
            # Create and output a schema feature at first.
            numColumns = int(FME_MacroValues['NUM'])
            schema = fmeobjects.FMEFeature()
            schema.setAttribute('fme_feature_type_name', \
                FME_MacroValues['DestExcelSheetName'])
            schema.setAttribute('fme_schema_handling', 'schema_only')
            for i in range(numColumns):
                schema.setAttribute('attribute{%d}.name' % i, xlsxColumnName(i))
                schema.setAttribute('attribute{%d}.fme_data_type' % i, 'fme_real64')
            self.pyoutput(schema)
            
            item = fmeobjects.FMEFeature()
            for attr in attribs:
                item.setAttribute('_attr_name', attr)
                item.setAttribute('A', attr)
                self.pyoutput(item)
            self.column_index = 1
        
        # Explode the incoming feature with the target attributes.
        item = fmeobjects.FMEFeature()
        columnName = xlsxColumnName(self.column_index)
        for attr in attribs:
            item.setAttribute('_attr_name', attr)
            item.setAttribute(columnName, feature.getAttribute(attr))
            self.pyoutput(item)
        self.column_index += 1

There still is a room to improve the performance.

To be continued (maybe).

This post is closed to further activity.
It may be an old question, an answered question, an implemented idea, or a notification-only post.
Please check post dates before relying on any information in a question or answer.
For follow-up or related questions, please post a new question or idea.
If there is a genuine update to be made, please contact us and request that the post is reopened.

fjcharp
Author
Forum|Forum|8 years ago
August 30, 2017

And just to expand a bit on my question... With PythonCaller, I can add the Excel column name attribute to the individual features, essentially duplicating what AttributeCreator does. But this new attribute does not get exposed, thus my question.

Upvote

+52

danilo_fme
Celebrity
Forum|Forum|8 years ago
August 30, 2017

Hi @fjcharp,

Could you share us your Workspace template ( .FMWT )?

Thanks,

Danilo

Partner Solutial Brazil - www.solutial.com.br

Upvote

fjcharp
Author
Forum|Forum|8 years ago
August 30, 2017

Hi @fjcharp,

Could you share us your Workspace template ( .FMWT )?

Thanks,

Danilo

I certainly can. moe.zip

The model reads in a table that I need to pivot and write to Excel. The steps are self-explanatory. The reader included here reads a small subset of the complete set (for testing purposes), but the subset is still large enough to require more columns than the ones I manually exposed.

You can ignore the PythonCaller component: it is work in progress that simply allowed me to examine the features output by AttributeCreator.

So the problem is: manually exposing an excessive number of attributes in AttributeCreator is painful and not practical. So my thought was: why don't I create the Excel column name attribute in the lists features (emulating what AttributeCreator does) and then expose these newly created attributes? That's where I hit the proverbial wall :-)

Frederic

Upvote

takashi
Forum|Forum|8 years ago
August 31, 2017

Hi @fjcharp, I think it would be easier to rename the attribute to destination Excel column name before aggregating the item features for each original CSV field name, rather than once creating a list then renaming its elements.

FME can rename the attribute to destination Excel column name and also configure destination feature type schema automatically, if you implement your workspace as a dynamic workflow. I think the method described in this article would be suitable to this case.

Dynamic Workflows: Destination Schema is Derived from a Schema Feature

Additionally, the NumToAlphaConverter custom transformer from FME Hub can be used effectively to generate a series of Excel column names based on a sequence of numbers.See the attached workspace example to learn more: moe-2.zip (FME 2017.0.1.1)Hope this helps.

Why not inspect features with Visual/Data Preview and Feature/Record Information before writing them into a destination dataset?

Upvote

+52

danilo_fme
Celebrity
Forum|Forum|8 years ago
August 31, 2017

Dynamic Workflows: Destination Schema is Derived from a Schema Feature

Hi @fjcharp, the @takashi did give you a good solution your case using Dynamic Workflow.

Congratulations @takashi

Partner Solutial Brazil - www.solutial.com.br

Upvote

fjcharp
Author
Forum|Forum|8 years ago
August 31, 2017

Takashi,

Wow... Thank you so much for taking the time. The modified template is quite the tutorial. I have learned a lot from it.

Frederic

Upvote

takashi
Best Answer
Forum|Forum|8 years ago
September 1, 2017

In this case, replacing more transformers with Python codes could increase the performance drastically. For example:

# PythonCaller Script Example
import fmeobjects

# Get Excel Column Name (A,B,C,...) corresponding to a column index.
# index: 0-based column index
def xlsxColumnName(index):
    alpha = ''
    while 0 <= index:
        alpha = chr(index % 26 + 65) + alpha
        index = index // 26 - 1
    return alpha

class FeatureProcessor(object):
    def __init__(self):
        self.column_index = 0
        self.is_target = lambda attr: \
            not '(1900/01/00)' in attr \
            and not attr.startswith('fme_') \
            and not attr.startswith('csv_')
            
    def input(self, feature):
        # Collect target attribute names.
        attribs = [a for a in feature.getAllAttributeNames() if self.is_target(a)]
        
        # First column (column index = 0) values are attribute names.
        # Assume the first incoming feature has all target attributes.
        if self.column_index == 0:
            # Create and output a schema feature at first.
            numColumns = int(FME_MacroValues['NUM'])
            schema = fmeobjects.FMEFeature()
            schema.setAttribute('fme_feature_type_name', \
                FME_MacroValues['DestExcelSheetName'])
            schema.setAttribute('fme_schema_handling', 'schema_only')
            for i in range(numColumns):
                schema.setAttribute('attribute{%d}.name' % i, xlsxColumnName(i))
                schema.setAttribute('attribute{%d}.fme_data_type' % i, 'fme_real64')
            self.pyoutput(schema)
            
            item = fmeobjects.FMEFeature()
            for attr in attribs:
                item.setAttribute('_attr_name', attr)
                item.setAttribute('A', attr)
                self.pyoutput(item)
            self.column_index = 1
        
        # Explode the incoming feature with the target attributes.
        item = fmeobjects.FMEFeature()
        columnName = xlsxColumnName(self.column_index)
        for attr in attribs:
            item.setAttribute('_attr_name', attr)
            item.setAttribute(columnName, feature.getAttribute(attr))
            self.pyoutput(item)
        self.column_index += 1

There still is a room to improve the performance.

To be continued (maybe).

Why not inspect features with Visual/Data Preview and Feature/Record Information before writing them into a destination dataset?

Upvote

fjcharp
Author
Forum|Forum|8 years ago
September 1, 2017

Hi @fjcharp,

Could you share us your Workspace template ( .FMWT )?

Thanks,

Danilo

???????????

Frederic

Upvote

fjcharp
Author
Forum|Forum|8 years ago
September 1, 2017

In this case, replacing more transformers with Python codes could increase the performance drastically. For example:

# PythonCaller Script Example
import fmeobjects

# Get Excel Column Name (A,B,C,...) corresponding to a column index.
# index: 0-based column index
def xlsxColumnName(index):
    alpha = ''
    while 0 <= index:
        alpha = chr(index % 26 + 65) + alpha
        index = index // 26 - 1
    return alpha

class FeatureProcessor(object):
    def __init__(self):
        self.column_index = 0
        self.is_target = lambda attr: \
            not '(1900/01/00)' in attr \
            and not attr.startswith('fme_') \
            and not attr.startswith('csv_')
            
    def input(self, feature):
        # Collect target attribute names.
        attribs = [a for a in feature.getAllAttributeNames() if self.is_target(a)]
        
        # First column (column index = 0) values are attribute names.
        # Assume the first incoming feature has all target attributes.
        if self.column_index == 0:
            # Create and output a schema feature at first.
            numColumns = int(FME_MacroValues['NUM'])
            schema = fmeobjects.FMEFeature()
            schema.setAttribute('fme_feature_type_name', \
                FME_MacroValues['DestExcelSheetName'])
            schema.setAttribute('fme_schema_handling', 'schema_only')
            for i in range(numColumns):
                schema.setAttribute('attribute{%d}.name' % i, xlsxColumnName(i))
                schema.setAttribute('attribute{%d}.fme_data_type' % i, 'fme_real64')
            self.pyoutput(schema)
            
            item = fmeobjects.FMEFeature()
            for attr in attribs:
                item.setAttribute('_attr_name', attr)
                item.setAttribute('A', attr)
                self.pyoutput(item)
            self.column_index = 1
        
        # Explode the incoming feature with the target attributes.
        item = fmeobjects.FMEFeature()
        columnName = xlsxColumnName(self.column_index)
        for attr in attribs:
            item.setAttribute('_attr_name', attr)
            item.setAttribute(columnName, feature.getAttribute(attr))
            self.pyoutput(item)
        self.column_index += 1

There still is a room to improve the performance.

To be continued (maybe).

Takashi,

Hmmm... This is a good seed for a more general-purpose pivot function. I had done something similar, but not being sufficiently experienced with schemas, I took the easy way out and wrote the output directly to a CSV file. I had also let the Aggregator create a list of columns items before invoking my Python Caller. The code is very rough... I was under the gun to get something out, so I slapped something together as fast as I could. I am going to spend some time now to clean this up and integrate your approach.

???????????,

Frederic

Upvote

takashi
Forum|Forum|8 years ago
September 2, 2017

???????? ^^

Further, the CSV reader and all transformers in the previous example can be replaced with a single PythonCreator, and it has increased the performance much.

# PythonCreator Script Example
import fmeobjects, csv
class FeatureCreator(object):
    def close(self):
        # Return True if the CSV field is NOT a target to be processed.
        notTarget = lambda field: \
            '(1900/01/00)' in field
            
        # Return Excel Column Name (A,B,C,...) corresponding to a column index.
        # index: 0-based column index
        def xlsxColumnName(index):
            name = ''
            while 0 <= index:
                name = chr(index % 26 + 65) + name
                index = index // 26 - 1
            return name

        # Create and output a schema feature.
        schema = fmeobjects.FMEFeature()
        attrToValue = {
            'fme_feature_type_name' : FME_MacroValues['DestExcelSheetName'],
            'fme_schema_handling' : 'schema_only',
        }
        for attr, value in attrToValue.items():
            schema.setAttribute(attr, value)
        for i in range(int(FME_MacroValues['NUM'])):
            schema.setAttribute('attribute{%d}.name' % i, xlsxColumnName(i))
            schema.setAttribute('attribute{%d}.fme_data_type' % i, 'fme_real64')
        self.pyoutput(schema)
        
        # Get cell values from source CSV table and save them into a dictionary.
        # key   = 0-based index of source field (destination row)
        # value = dictionary {destination column name : cell value}
        data = {}
        with open(FME_MacroValues['SourceDataset_CSV2'], 'r') as csvfile:
            nontargets = None
            for i, row in enumerate(csv.reader(csvfile)):
                # Collect non-target keys from the first row (i.e. field names).
                if i == 0:
                    nontargets = {k for k, v in enumerate(row) if notTarget(v)}
                c = xlsxColumnName(i)
                for k, v in enumerate(row):
                    if k not in nontargets:
                        data.setdefault(k, {})[c] = v
                    
        # Create and output data features for each row in destination Excel table.
        for k in sorted(data.keys()):
            feature = fmeobjects.FMEFeature()
            for c, v in data[k].items():
                feature.setAttribute(c, v)
            self.pyoutput(feature)

Dynamic Schema Configuration in the Excel Writer Feature Type

If you don't need to write the field names (A,B,C...) into the first row,

set 'No' to the Output Field Names parameter,
and
set 'Yes' to the Use Attribute Names As Column Positions parameter.

Why not inspect features with Visual/Data Preview and Feature/Record Information before writing them into a destination dataset?

Upvote

AttributeCreator and long list of attributes to expose

10 replies

Community Stats

Latest FME

Community Stats

Latest FME

Sign up

An FME Account is required to contribute

Login to the community

An FME Account is required to contribute

Scanning file for viruses.

This file cannot be downloaded