Solved

AttributeCreator and long list of attributes to expose


Badge

I use AttributeCreator to prepare a list of columns (A, B, C, D, etc..) for an Excel writer. My problem is that the list is 4000 items long. I have to manually expose every one of the column labels, which is not feasible. I have looked up and down and did not find a solution. Is there a way with PythonCaller to label a feature as exposed?

icon

Best answer by takashi 1 September 2017, 05:03

View original

10 replies

Badge
And just to expand a bit on my question... With PythonCaller, I can add the Excel column name attribute to the individual features, essentially duplicating what AttributeCreator does. But this new attribute does not get exposed, thus my question.

 

 

 

Userlevel 4
Badge +30

Hi @fjcharp,

Could you share us your Workspace template ( .FMWT )?

Thanks,

Danilo

Badge

Hi @fjcharp,

Could you share us your Workspace template ( .FMWT )?

Thanks,

Danilo

 

I certainly can. moe.zip

 

The model reads in a table that I need to pivot and write to Excel. The steps are self-explanatory. The reader included here reads a small subset of the complete set (for testing purposes), but the subset is still large enough to require more columns than the ones I manually exposed.

 

You can ignore the PythonCaller component: it is work in progress that simply allowed me to examine the features output by AttributeCreator.

 

So the problem is: manually exposing an excessive number of attributes in AttributeCreator is painful and not practical. So my thought was: why don't I create the Excel column name attribute in the lists features (emulating what AttributeCreator does) and then expose these newly created attributes? That's where I hit the proverbial wall :-)

 

Frederic
Userlevel 2
Badge +17

Hi @fjcharp, I think it would be easier to rename the attribute to destination Excel column name before aggregating the item features for each original CSV field name, rather than once creating a list then renaming its elements.

FME can rename the attribute to destination Excel column name and also configure destination feature type schema automatically, if you implement your workspace as a dynamic workflow. I think the method described in this article would be suitable to this case.

Dynamic Workflows: Destination Schema is Derived from a Schema Feature

Additionally, the NumToAlphaConverter custom transformer from FME Hub can be used effectively to generate a series of Excel column names based on a sequence of numbers.See the attached workspace example to learn more: moe-2.zip (FME 2017.0.1.1)Hope this helps.
Userlevel 4
Badge +30

Hi @fjcharp, I think it would be easier to rename the attribute to destination Excel column name before aggregating the item features for each original CSV field name, rather than once creating a list then renaming its elements.

FME can rename the attribute to destination Excel column name and also configure destination feature type schema automatically, if you implement your workspace as a dynamic workflow. I think the method described in this article would be suitable to this case.

Dynamic Workflows: Destination Schema is Derived from a Schema Feature

Additionally, the NumToAlphaConverter custom transformer from FME Hub can be used effectively to generate a series of Excel column names based on a sequence of numbers.See the attached workspace example to learn more: moe-2.zip (FME 2017.0.1.1)Hope this helps.
Hi @fjcharp, the @takashi did give you a good solution your case using Dynamic Workflow.

 

Congratulations @takashi
Badge

Takashi,

Wow... Thank you so much for taking the time. The modified template is quite the tutorial. I have learned a lot from it.

Frederic

Userlevel 2
Badge +17

In this case, replacing more transformers with Python codes could increase the performance drastically. For example:

0684Q00000ArJMvQAN.png

# PythonCaller Script Example
import fmeobjects

# Get Excel Column Name (A,B,C,...) corresponding to a column index.
# index: 0-based column index
def xlsxColumnName(index):
    alpha = ''
    while 0 <= index:
        alpha = chr(index % 26 + 65) + alpha
        index = index // 26 - 1
    return alpha

class FeatureProcessor(object):
    def __init__(self):
        self.column_index = 0
        self.is_target = lambda attr: \
            not '(1900/01/00)' in attr \
            and not attr.startswith('fme_') \
            and not attr.startswith('csv_')
            
    def input(self, feature):
        # Collect target attribute names.
        attribs = [a for a in feature.getAllAttributeNames() if self.is_target(a)]
        
        # First column (column index = 0) values are attribute names.
        # Assume the first incoming feature has all target attributes.
        if self.column_index == 0:
            # Create and output a schema feature at first.
            numColumns = int(FME_MacroValues['NUM'])
            schema = fmeobjects.FMEFeature()
            schema.setAttribute('fme_feature_type_name', \
                FME_MacroValues['DestExcelSheetName'])
            schema.setAttribute('fme_schema_handling', 'schema_only')
            for i in range(numColumns):
                schema.setAttribute('attribute{%d}.name' % i, xlsxColumnName(i))
                schema.setAttribute('attribute{%d}.fme_data_type' % i, 'fme_real64')
            self.pyoutput(schema)
            
            item = fmeobjects.FMEFeature()
            for attr in attribs:
                item.setAttribute('_attr_name', attr)
                item.setAttribute('A', attr)
                self.pyoutput(item)
            self.column_index = 1
        
        # Explode the incoming feature with the target attributes.
        item = fmeobjects.FMEFeature()
        columnName = xlsxColumnName(self.column_index)
        for attr in attribs:
            item.setAttribute('_attr_name', attr)
            item.setAttribute(columnName, feature.getAttribute(attr))
            self.pyoutput(item)
        self.column_index += 1

There still is a room to improve the performance.

To be continued (maybe).

Badge

Hi @fjcharp,

Could you share us your Workspace template ( .FMWT )?

Thanks,

Danilo

???????????

 

Frederic

 

 

Badge

In this case, replacing more transformers with Python codes could increase the performance drastically. For example:

0684Q00000ArJMvQAN.png

# PythonCaller Script Example
import fmeobjects

# Get Excel Column Name (A,B,C,...) corresponding to a column index.
# index: 0-based column index
def xlsxColumnName(index):
    alpha = ''
    while 0 <= index:
        alpha = chr(index % 26 + 65) + alpha
        index = index // 26 - 1
    return alpha

class FeatureProcessor(object):
    def __init__(self):
        self.column_index = 0
        self.is_target = lambda attr: \
            not '(1900/01/00)' in attr \
            and not attr.startswith('fme_') \
            and not attr.startswith('csv_')
            
    def input(self, feature):
        # Collect target attribute names.
        attribs = [a for a in feature.getAllAttributeNames() if self.is_target(a)]
        
        # First column (column index = 0) values are attribute names.
        # Assume the first incoming feature has all target attributes.
        if self.column_index == 0:
            # Create and output a schema feature at first.
            numColumns = int(FME_MacroValues['NUM'])
            schema = fmeobjects.FMEFeature()
            schema.setAttribute('fme_feature_type_name', \
                FME_MacroValues['DestExcelSheetName'])
            schema.setAttribute('fme_schema_handling', 'schema_only')
            for i in range(numColumns):
                schema.setAttribute('attribute{%d}.name' % i, xlsxColumnName(i))
                schema.setAttribute('attribute{%d}.fme_data_type' % i, 'fme_real64')
            self.pyoutput(schema)
            
            item = fmeobjects.FMEFeature()
            for attr in attribs:
                item.setAttribute('_attr_name', attr)
                item.setAttribute('A', attr)
                self.pyoutput(item)
            self.column_index = 1
        
        # Explode the incoming feature with the target attributes.
        item = fmeobjects.FMEFeature()
        columnName = xlsxColumnName(self.column_index)
        for attr in attribs:
            item.setAttribute('_attr_name', attr)
            item.setAttribute(columnName, feature.getAttribute(attr))
            self.pyoutput(item)
        self.column_index += 1

There still is a room to improve the performance.

To be continued (maybe).

Takashi,

 

 

Hmmm... This is a good seed for a more general-purpose pivot function. I had done something similar, but not being sufficiently experienced with schemas, I took the easy way out and wrote the output directly to a CSV file. I had also let the Aggregator create a list of columns items before invoking my Python Caller. The code is very rough... I was under the gun to get something out, so I slapped something together as fast as I could. I am going to spend some time now to clean this up and integrate your approach.

 

 

???????????,

 

Frederic 

 

 

Userlevel 2
Badge +17

???????? ^^

Further, the CSV reader and all transformers in the previous example can be replaced with a single PythonCreator, and it has increased the performance much.

0684Q00000ArJo8QAF.png

# PythonCreator Script Example
import fmeobjects, csv
class FeatureCreator(object):
    def close(self):
        # Return True if the CSV field is NOT a target to be processed.
        notTarget = lambda field: \
            '(1900/01/00)' in field
            
        # Return Excel Column Name (A,B,C,...) corresponding to a column index.
        # index: 0-based column index
        def xlsxColumnName(index):
            name = ''
            while 0 <= index:
                name = chr(index % 26 + 65) + name
                index = index // 26 - 1
            return name

        # Create and output a schema feature.
        schema = fmeobjects.FMEFeature()
        attrToValue = {
            'fme_feature_type_name' : FME_MacroValues['DestExcelSheetName'],
            'fme_schema_handling' : 'schema_only',
        }
        for attr, value in attrToValue.items():
            schema.setAttribute(attr, value)
        for i in range(int(FME_MacroValues['NUM'])):
            schema.setAttribute('attribute{%d}.name' % i, xlsxColumnName(i))
            schema.setAttribute('attribute{%d}.fme_data_type' % i, 'fme_real64')
        self.pyoutput(schema)
        
        # Get cell values from source CSV table and save them into a dictionary.
        # key   = 0-based index of source field (destination row)
        # value = dictionary {destination column name : cell value}
        data = {}
        with open(FME_MacroValues['SourceDataset_CSV2'], 'r') as csvfile:
            nontargets = None
            for i, row in enumerate(csv.reader(csvfile)):
                # Collect non-target keys from the first row (i.e. field names).
                if i == 0:
                    nontargets = {k for k, v in enumerate(row) if notTarget(v)}
                c = xlsxColumnName(i)
                for k, v in enumerate(row):
                    if k not in nontargets:
                        data.setdefault(k, {})[c] = v
                    
        # Create and output data features for each row in destination Excel table.
        for k in sorted(data.keys()):
            feature = fmeobjects.FMEFeature()
            for c, v in data[k].items():
                feature.setAttribute(c, v)
            self.pyoutput(feature)

Dynamic Schema Configuration in the Excel Writer Feature Type

If you don't need to write the field names (A,B,C...) into the first row,

  • set 'No' to the Output Field Names parameter,
  • and
    set 'Yes' to the Use Attribute Names As Column Positions parameter.

0684Q00000ArJl5QAF.png

Reply