Skip to main content

Hello Folks,

I have been presented with an interesting issue.
I have a CSV which consist of data with multiple headers and related data for those headers. 
 

#header 1 and values
observationId,broid,_response_body_new,_http_status_code
id_01682157639,GLD000000036023,"""BRO-ID"",""bronhouder"",,""kwaliteitsregime"",""datum eerste meting"",""datum recentste meting""
GLD000000036023"",""50200097"",,""IMBRO"",""(2005-11-27, JJJJ-MM-DD)"",""(2025-04-10, JJJJ-MM-DD)""
,,,,,

#header 2 and values
""put BRO-ID"",""put buisnummer"",,""monitoringnet BRO-ID"",,
GMW000000074173"",""2"",,,,
,,,,,

#header 3 and values
observatie ID"",""start observatieperiode"",""eind observatieperiode"",""observatietype"",""mate beoordeling"",""observatieproces ID""
id_OMO_95135"",""(2024-07-18, JJJJ-MM-DD)"",""(2025-04-10, JJJJ-MM-DD)"",""reguliereMeting"",""volledigBeoordeeld"",""id_OP_95135""
,,,,,

#header 4 and values
""tijdstip meting"",""waterstand"",""status kwaliteitscontrole"",""censuurreden"",""censuurlimietwaarde"",""interpolatietype""
2024-07-18T12:00:00+02:00"",""30.602"",""goedgekeurd"",,,""discontinu""

,,,,,

,,,,,

 

and then header 3 & 4 (are related to each other as well)  repeat throughout the file.
How can I extract all these headers and their values into chunks in FME.
a single CSV with one header is no issue for me to get data out of. maybe i am missing something so obvious?
Please guide me
Linked a sample CSV for better understanding.
 

  1. first we need to read the CSV as Text File;
  2. then with counter we assign a unique id to each row;
  3. then we distinquish the header row from non-header row by defining a attribute named "isHeader” and using REGEX 

     

  4. we use python caller to assign a block-id to each group of line following a header
  5. afterwards you can do separate rows based on the block-id and do whatever you intend to do

    python caller:

    import fme
    import fmeobjects


    class FeatureProcessor(object):
        """Template Class Interface:
        When using this class, make sure its name is set as the value of the 'Class to Process Features'
        transformer parameter.
        """
        
        def __init__(self):
            self.block_id=0
            """Base constructor for class members."""
            pass
            
        def has_support_for(self, support_type: int):
            """This method is called by FME to determine if the PythonCaller supports Bulk mode,
            which allows for significant performance gains when processing large numbers of features.
            Bulk mode cannot always be supported. 
            More information available in transformer help.
            """
            return support_type == fmeobjects.FME_SUPPORT_FEATURE_TABLE_SHIM
      
        def input(self, feature: fmeobjects.FMEFeature):
            is_header=feature.getAttribute('isHeader')
            cnt=feature.getAttribute('_count')
            if is_header=='Yes':
                self.block_id+=1
                
            feature.setAttribute('block_id', self.block_id)
            self.pyoutput(feature)

        def close(self):
            """This method is called once all the FME Features have been processed from input()."""
            pass

        def process_group(self):
            """This method is called by FME for each group when group processing mode is enabled.
            This implementation should reset any instance variables used for the next group. 
            Bulk mode should not be enabled when using group processing. 
            More information available in transformer help.
            """
            pass


Nice idea. You can also easily do this without python.

  1. Read the CSV as Text File
  2. Define the attribute "isHeader” as ​@mramezani suggests, and an attribute “GroupID".
  3. Use an AttributeManager with Adjacent Feature Attributes enabled, looking at 1 prior feature. Set the default value for substitution to 0.
  4. Set a Conditional Value in "GroupID”. When “isHeader"="Yes”, add 1 to the value of "GroupID” of the prior feature. When “isHeader"="No”, use the value of "GroupID” of the prior feature.

I'm not at a computer running FME at the moment, so you will have to look up the details yourself.


  1. first we need to read the CSV as Text File;
  2. then with counter we assign a unique id to each row;
  3. then we distinquish the header row from non-header row by defining a attribute named "isHeader” and using REGEX 

     

  4. we use python caller to assign a block-id to each group of line following a header
  5. afterwards you can do separate rows based on the block-id and do whatever you intend to do

    python caller:

    import fme
    import fmeobjects


    class FeatureProcessor(object):
        """Template Class Interface:
        When using this class, make sure its name is set as the value of the 'Class to Process Features'
        transformer parameter.
        """
        
        def __init__(self):
            self.block_id=0
            """Base constructor for class members."""
            pass
            
        def has_support_for(self, support_type: int):
            """This method is called by FME to determine if the PythonCaller supports Bulk mode,
            which allows for significant performance gains when processing large numbers of features.
            Bulk mode cannot always be supported. 
            More information available in transformer help.
            """
            return support_type == fmeobjects.FME_SUPPORT_FEATURE_TABLE_SHIM
      
        def input(self, feature: fmeobjects.FMEFeature):
            is_header=feature.getAttribute('isHeader')
            cnt=feature.getAttribute('_count')
            if is_header=='Yes':
                self.block_id+=1
                
            feature.setAttribute('block_id', self.block_id)
            self.pyoutput(feature)

        def close(self):
            """This method is called once all the FME Features have been processed from input()."""
            pass

        def process_group(self):
            """This method is called by FME for each group when group processing mode is enabled.
            This implementation should reset any instance variables used for the next group. 
            Bulk mode should not be enabled when using group processing. 
            More information available in transformer help.
            """
            pass

Thanks ​@mramezani it does give me a good direction to solve this issue.


Nice idea. You can also easily do this without python.

  1. Read the CSV as Text File
  2. Define the attribute "isHeader” as ​@mramezani suggests, and an attribute “GroupID".
  3. Use an AttributeManager with Adjacent Feature Attributes enabled, looking at 1 prior feature. Set the default value for substitution to 0.
  4. Set a Conditional Value in "GroupID”. When “isHeader"="Yes”, add 1 to the value of "GroupID” of the prior feature. When “isHeader"="No”, use the value of "GroupID” of the prior feature.

I'm not at a computer running FME at the moment, so you will have to look up the details yourself.

@geomancer I will check this as well, for now ​@mramezani method does solve it.


Hi ​@goldstein ,

Another way, you can also use a global variable to define "Block ID" for each header row and propagate it to subsequent data rows. This is an old work flow design before the Enable Adjacent Feature Attributes option have been introduced into AttributeCreator/AttributeManager, but still effective in some cases, I think.

 


Ah yes, the curious construction where the VariableRetriever is used before the VariableSetter.


Oh, this workflow is better.