I use AttributeCreator to prepare a list of columns (A, B, C, D, etc..) for an Excel writer. My problem is that the list is 4000 items long. I have to manually expose every one of the column labels, which is not feasible. I have looked up and down and did not find a solution. Is there a way with PythonCaller to label a feature as exposed?
Hi @fjcharp,
Could you share us your Workspace template ( .FMWT )?
Thanks,
Danilo
Hi @fjcharp,
Could you share us your Workspace template ( .FMWT )?
Thanks,
Danilo
I certainly can. moe.zip
The model reads in a table that I need to pivot and write to Excel. The steps are self-explanatory. The reader included here reads a small subset of the complete set (for testing purposes), but the subset is still large enough to require more columns than the ones I manually exposed.
You can ignore the PythonCaller component: it is work in progress that simply allowed me to examine the features output by AttributeCreator.
So the problem is: manually exposing an excessive number of attributes in AttributeCreator is painful and not practical. So my thought was: why don't I create the Excel column name attribute in the lists features (emulating what AttributeCreator does) and then expose these newly created attributes? That's where I hit the proverbial wall :-)
Frederic
Hi @fjcharp, I think it would be easier to rename the attribute to destination Excel column name before aggregating the item features for each original CSV field name, rather than once creating a list then renaming its elements.
FME can rename the attribute to destination Excel column name and also configure destination feature type schema automatically, if you implement your workspace as a dynamic workflow. I think the method described in this article would be suitable to this case.
Dynamic Workflows: Destination Schema is Derived from a Schema Feature
Additionally, the NumToAlphaConverter custom transformer from FME Hub can be used effectively to generate a series of Excel column names based on a sequence of numbers.See the attached workspace example to learn more: moe-2.zip (FME 2017.0.1.1)Hope this helps.Hi @fjcharp, I think it would be easier to rename the attribute to destination Excel column name before aggregating the item features for each original CSV field name, rather than once creating a list then renaming its elements.
FME can rename the attribute to destination Excel column name and also configure destination feature type schema automatically, if you implement your workspace as a dynamic workflow. I think the method described in this article would be suitable to this case.
Dynamic Workflows: Destination Schema is Derived from a Schema Feature
Additionally, the NumToAlphaConverter custom transformer from FME Hub can be used effectively to generate a series of Excel column names based on a sequence of numbers.See the attached workspace example to learn more: moe-2.zip (FME 2017.0.1.1)Hope this helps.Congratulations @takashi
Takashi,
Wow... Thank you so much for taking the time. The modified template is quite the tutorial. I have learned a lot from it.
Frederic
In this case, replacing more transformers with Python codes could increase the performance drastically. For example:
# PythonCaller Script Example
import fmeobjects
# Get Excel Column Name (A,B,C,...) corresponding to a column index.
# index: 0-based column index
def xlsxColumnName(index):
alpha = ''
while 0 <= index:
alpha = chr(index % 26 + 65) + alpha
index = index // 26 - 1
return alpha
class FeatureProcessor(object):
def __init__(self):
self.column_index = 0
self.is_target = lambda attr: \
not '(1900/01/00)' in attr \
and not attr.startswith('fme_') \
and not attr.startswith('csv_')
def input(self, feature):
# Collect target attribute names.
attribs = [a for a in feature.getAllAttributeNames() if self.is_target(a)]
# First column (column index = 0) values are attribute names.
# Assume the first incoming feature has all target attributes.
if self.column_index == 0:
# Create and output a schema feature at first.
numColumns = int(FME_MacroValues['NUM'])
schema = fmeobjects.FMEFeature()
schema.setAttribute('fme_feature_type_name', \
FME_MacroValues['DestExcelSheetName'])
schema.setAttribute('fme_schema_handling', 'schema_only')
for i in range(numColumns):
schema.setAttribute('attribute{%d}.name' % i, xlsxColumnName(i))
schema.setAttribute('attribute{%d}.fme_data_type' % i, 'fme_real64')
self.pyoutput(schema)
item = fmeobjects.FMEFeature()
for attr in attribs:
item.setAttribute('_attr_name', attr)
item.setAttribute('A', attr)
self.pyoutput(item)
self.column_index = 1
# Explode the incoming feature with the target attributes.
item = fmeobjects.FMEFeature()
columnName = xlsxColumnName(self.column_index)
for attr in attribs:
item.setAttribute('_attr_name', attr)
item.setAttribute(columnName, feature.getAttribute(attr))
self.pyoutput(item)
self.column_index += 1
There still is a room to improve the performance.
To be continued (maybe).
Hi @fjcharp,
Could you share us your Workspace template ( .FMWT )?
Thanks,
Danilo
Frederic
In this case, replacing more transformers with Python codes could increase the performance drastically. For example:
# PythonCaller Script Example
import fmeobjects
# Get Excel Column Name (A,B,C,...) corresponding to a column index.
# index: 0-based column index
def xlsxColumnName(index):
alpha = ''
while 0 <= index:
alpha = chr(index % 26 + 65) + alpha
index = index // 26 - 1
return alpha
class FeatureProcessor(object):
def __init__(self):
self.column_index = 0
self.is_target = lambda attr: \
not '(1900/01/00)' in attr \
and not attr.startswith('fme_') \
and not attr.startswith('csv_')
def input(self, feature):
# Collect target attribute names.
attribs = [a for a in feature.getAllAttributeNames() if self.is_target(a)]
# First column (column index = 0) values are attribute names.
# Assume the first incoming feature has all target attributes.
if self.column_index == 0:
# Create and output a schema feature at first.
numColumns = int(FME_MacroValues['NUM'])
schema = fmeobjects.FMEFeature()
schema.setAttribute('fme_feature_type_name', \
FME_MacroValues['DestExcelSheetName'])
schema.setAttribute('fme_schema_handling', 'schema_only')
for i in range(numColumns):
schema.setAttribute('attribute{%d}.name' % i, xlsxColumnName(i))
schema.setAttribute('attribute{%d}.fme_data_type' % i, 'fme_real64')
self.pyoutput(schema)
item = fmeobjects.FMEFeature()
for attr in attribs:
item.setAttribute('_attr_name', attr)
item.setAttribute('A', attr)
self.pyoutput(item)
self.column_index = 1
# Explode the incoming feature with the target attributes.
item = fmeobjects.FMEFeature()
columnName = xlsxColumnName(self.column_index)
for attr in attribs:
item.setAttribute('_attr_name', attr)
item.setAttribute(columnName, feature.getAttribute(attr))
self.pyoutput(item)
self.column_index += 1
There still is a room to improve the performance.
To be continued (maybe).
Hmmm... This is a good seed for a more general-purpose pivot function. I had done something similar, but not being sufficiently experienced with schemas, I took the easy way out and wrote the output directly to a CSV file. I had also let the Aggregator create a list of columns items before invoking my Python Caller. The code is very rough... I was under the gun to get something out, so I slapped something together as fast as I could. I am going to spend some time now to clean this up and integrate your approach.
???????????,
Frederic
???????? ^^
Further, the CSV reader and all transformers in the previous example can be replaced with a single PythonCreator, and it has increased the performance much.
# PythonCreator Script Example
import fmeobjects, csv
class FeatureCreator(object):
def close(self):
# Return True if the CSV field is NOT a target to be processed.
notTarget = lambda field: \
'(1900/01/00)' in field
# Return Excel Column Name (A,B,C,...) corresponding to a column index.
# index: 0-based column index
def xlsxColumnName(index):
name = ''
while 0 <= index:
name = chr(index % 26 + 65) + name
index = index // 26 - 1
return name
# Create and output a schema feature.
schema = fmeobjects.FMEFeature()
attrToValue = {
'fme_feature_type_name' : FME_MacroValues['DestExcelSheetName'],
'fme_schema_handling' : 'schema_only',
}
for attr, value in attrToValue.items():
schema.setAttribute(attr, value)
for i in range(int(FME_MacroValues['NUM'])):
schema.setAttribute('attribute{%d}.name' % i, xlsxColumnName(i))
schema.setAttribute('attribute{%d}.fme_data_type' % i, 'fme_real64')
self.pyoutput(schema)
# Get cell values from source CSV table and save them into a dictionary.
# key = 0-based index of source field (destination row)
# value = dictionary {destination column name : cell value}
data = {}
with open(FME_MacroValues['SourceDataset_CSV2'], 'r') as csvfile:
nontargets = None
for i, row in enumerate(csv.reader(csvfile)):
# Collect non-target keys from the first row (i.e. field names).
if i == 0:
nontargets = {k for k, v in enumerate(row) if notTarget(v)}
c = xlsxColumnName(i)
for k, v in enumerate(row):
if k not in nontargets:
data.setdefault(k, {})[c] = v
# Create and output data features for each row in destination Excel table.
for k in sorted(data.keys()):
feature = fmeobjects.FMEFeature()
for c, v in data[k].items():
feature.setAttribute(c, v)
self.pyoutput(feature)
Dynamic Schema Configuration in the Excel Writer Feature Type
If you don't need to write the field names (A,B,C...) into the first row,
- set 'No' to the Output Field Names parameter,
- and
set 'Yes' to the Use Attribute Names As Column Positions parameter.