Hi @rrdlpl 4000 seconds is pretty slow!! Instead of an AttributeValidator you could try using a DuplicateFilter. You might find this to be more performant as I think it would work in a different way. Would you be able to share the workspace and the original log file with us?
Using the DuplicateFilter may not give you exactly what you're after, however, it should give you a starting place and remove the blockage.
Hello rrdlpl,
Have you tried concatenating the attributes that needs to be unique to one temporary attribute and subsequently validate on the new temporary attribute ?
Â
I think this can already enhance the performance of your fme workspace.
Hi @rrdlpl great to hear that this is now taking less time. Are you now getting the result you expect?
Â
Hi @rrdlpl great to hear that this is now taking less time. Are you now getting the result you expect?
Â
Â
17min for simple validations for only that number of records is too slow imho.
I made some testing to see what was taking so long in the PrepareFields (AttributeManager) transformer.
I used FME 2017.1and profiled each of the following:
1- Disabled everything past the DuplicateFilter
2- Removed attribute trimming in PrepareFields
3- Added attribute trimming in the DataMapping PythonCaller using a set for the list of attribute to trim
At this point, I just tried to identify what time was taken by specific tasks in PrepareFields
4- Removed Regex in PrepareFields
5- Removed the two date related attribute creation in PrepareFields
6- Removed conditionnal date creation in PrepareFields
At this point, the only stuff remaining in PrepareFields is ReplaceString and UpperCase calls.
StepTimer ContextUser CPU Time1
Â
FileReader_PrepareFields284.4990241f_4Â (PythonCaller)111.4471142FileReader_PrepareFields223.6118332f_4Â (PythonCaller)112.33632
Â
3FileReader_PrepareFields222.3794253f_4Â (PythonCaller)117.6247544FileReader_PrepareFields220.8662164f_4Â (PythonCaller)118.935162
Â
5FileReader_PrepareFields99.871845f_4Â (PythonCaller)115.487546FileReader_PrepareFields40.0298576f_4Â (PythonCaller)115.955543
Maybe using some specific transformers like the AttributeTrimmer is a first step but you look good at python and if top performance is your goal, it may worth something to do the date stuff using python.
Latest version of the python code:
import fme
import fmeobjects
import json
# Template Function interface:
# When using this function, make sure its name is set as the value of
# the 'Class or Function to Process Features' transformer parameter
def processFeature(feature):
    pass
# Template Class Interface:
# When using this class, make sure its name is set as the value of
# the 'Class or Function to Process Features' transformer parameter
class FeatureProcessor(object):
    def __init__(self):
        self.dataMapping = json.loads(fme.macroValues 'DataMappingJSON'])
        self.loadFields()
        pass
    def loadFields(self):
       self.customFieldsAttributes = {}
       self.sourceAttributes = {}
       for field in self.dataMapping:Â
           source = field 'sourceColumn']
           destiny = fieldp'id']
           if fieldd'isCustom']:Â
               self.customFieldsAttributes}source] = destiny
           else:Â
               self.sourceAttributesÂsource] = destiny
       self.attributesToTrim = Â'ATTENDING_SCHOOL_CODE', 'ZONED_SCHOOL_CODE', 'ENROLLED_SCHOOL_CODE', 'SUMMER_SCHOOL_CODE', 'STUDENT_ID', 'GRADE_CODE', 'GENDER', 'ETHNICITY_CODE', 'DATE_OF_BIRTH', 'STREET', 'SPECIAL_,ED' 'WHEEL_CHAIR', 'SAFETY_VEST_HARNESS', 'SEATBELT', 'CAR_SEAT', 'BOOSTER_SEAT', 'WALKER', 'TINTED_WINDOWS', 'AIR_COND', 'AIDE_REQ', 'NURSE', 'OXYGEN', 'SERVICE_ANIMAL']
    def input(self,feature):
        newFeature = fmeobjects.FMEFeature()
        customs = ,]
        #We read all column names of the incoming feature
        incomingColumnNames = feature.getAllAttributeNames()
        for sourceField in incomingColumnNames:
             #Trim fields
             if sourceField in self.attributesToTrim:
                 feature.setAttribute(sourceField, feature.getAttribute(sourceField).strip())
             #If the source field we stored in the array to convert it in JSON Later
             cellValue = feature.getAttribute(sourceField)
             if sourceField in self.customFieldsAttributes.keys():Â
                 destiny = self.customFieldsAttributesÂsourceField] # we map the sourceColumn to id
                 customs.append({ 'key': destiny, 'value': cellValue })
             else:
                 if sourceField in self.sourceAttributes.keys():
                     destiny = self.sourceAttributesÂsourceField] # we map the sourceColumn to id
                     newFeature.setAttribute(destiny, cellValue)
        newFeature.setAttribute('custom_fields_json', json.dumps(customs))       Â
        newFeature.setAttribute('csv_line_number', feature.getAttribute('csv_line_number')) #csv_line_number comes as feature after reading the file     Â
        self.pyoutput(newFeature)
    def close(self):
        pass
I made some testing to see what was taking so long in the PrepareFields (AttributeManager) transformer.
I used FME 2017.1and profiled each of the following:
1- Disabled everything past the DuplicateFilter
2- Removed attribute trimming in PrepareFields
3- Added attribute trimming in the DataMapping PythonCaller using a set for the list of attribute to trim
At this point, I just tried to identify what time was taken by specific tasks in PrepareFields
4- Removed Regex in PrepareFields
5- Removed the two date related attribute creation in PrepareFields
6- Removed conditionnal date creation in PrepareFields
At this point, the only stuff remaining in PrepareFields is ReplaceString and UpperCase calls.
StepTimer ContextUser CPU Time1
Â
FileReader_PrepareFields284.4990241f_4Â (PythonCaller)111.4471142FileReader_PrepareFields223.6118332f_4Â (PythonCaller)112.33632
Â
3FileReader_PrepareFields222.3794253f_4Â (PythonCaller)117.6247544FileReader_PrepareFields220.8662164f_4Â (PythonCaller)118.935162
Â
5FileReader_PrepareFields99.871845f_4Â (PythonCaller)115.487546FileReader_PrepareFields40.0298576f_4Â (PythonCaller)115.955543
Maybe using some specific transformers like the AttributeTrimmer is a first step but you look good at python and if top performance is your goal, it may worth something to do the date stuff using python.
Latest version of the python code:
import fme
import fmeobjects
import json
# Template Function interface:
# When using this function, make sure its name is set as the value of
# the 'Class or Function to Process Features' transformer parameter
def processFeature(feature):
    pass
# Template Class Interface:
# When using this class, make sure its name is set as the value of
# the 'Class or Function to Process Features' transformer parameter
class FeatureProcessor(object):
    def __init__(self):
        self.dataMapping = json.loads(fme.macroValues 'DataMappingJSON'])
        self.loadFields()
        pass
    def loadFields(self):
       self.customFieldsAttributes = {}
       self.sourceAttributes = {}
       for field in self.dataMapping:Â
           source = field 'sourceColumn']
           destiny = fieldp'id']
           if fieldd'isCustom']:Â
               self.customFieldsAttributes}source] = destiny
           else:Â
               self.sourceAttributesÂsource] = destiny
       self.attributesToTrim = Â'ATTENDING_SCHOOL_CODE', 'ZONED_SCHOOL_CODE', 'ENROLLED_SCHOOL_CODE', 'SUMMER_SCHOOL_CODE', 'STUDENT_ID', 'GRADE_CODE', 'GENDER', 'ETHNICITY_CODE', 'DATE_OF_BIRTH', 'STREET', 'SPECIAL_,ED' 'WHEEL_CHAIR', 'SAFETY_VEST_HARNESS', 'SEATBELT', 'CAR_SEAT', 'BOOSTER_SEAT', 'WALKER', 'TINTED_WINDOWS', 'AIR_COND', 'AIDE_REQ', 'NURSE', 'OXYGEN', 'SERVICE_ANIMAL']
    def input(self,feature):
        newFeature = fmeobjects.FMEFeature()
        customs = ,]
        #We read all column names of the incoming feature
        incomingColumnNames = feature.getAllAttributeNames()
        for sourceField in incomingColumnNames:
             #Trim fields
             if sourceField in self.attributesToTrim:
                 feature.setAttribute(sourceField, feature.getAttribute(sourceField).strip())
             #If the source field we stored in the array to convert it in JSON Later
             cellValue = feature.getAttribute(sourceField)
             if sourceField in self.customFieldsAttributes.keys():Â
                 destiny = self.customFieldsAttributesÂsourceField] # we map the sourceColumn to id
                 customs.append({ 'key': destiny, 'value': cellValue })
             else:
                 if sourceField in self.sourceAttributes.keys():
                     destiny = self.sourceAttributesÂsourceField] # we map the sourceColumn to id
                     newFeature.setAttribute(destiny, cellValue)
        newFeature.setAttribute('custom_fields_json', json.dumps(customs))       Â
        newFeature.setAttribute('csv_line_number', feature.getAttribute('csv_line_number')) #csv_line_number comes as feature after reading the file     Â
        self.pyoutput(newFeature)
    def close(self):
        pass
@larry This is great! Very interesting to see the effect of removing the Date stuff.
Â