Solved

Splitting multi header CSV into chunks.

Forum|Forum|7 months ago
August 4, 2025
7 replies
136 views

goldstein
Contributor

Hello Folks,

I have been presented with an interesting issue.
I have a CSV which consist of data with multiple headers and related data for those headers.

#header 1 and values
observationId,broid,_response_body_new,_http_status_code
id_01682157639,GLD000000036023,"""BRO-ID"",""bronhouder"",,""kwaliteitsregime"",""datum eerste meting"",""datum recentste meting""
GLD000000036023"",""50200097"",,""IMBRO"",""(2005-11-27, JJJJ-MM-DD)"",""(2025-04-10, JJJJ-MM-DD)""
,,,,,

#header 2 and values
""put BRO-ID"",""put buisnummer"",,""monitoringnet BRO-ID"",,
GMW000000074173"",""2"",,,,
,,,,,

#header 3 and values
observatie ID"",""start observatieperiode"",""eind observatieperiode"",""observatietype"",""mate beoordeling"",""observatieproces ID""
id_OMO_95135"",""(2024-07-18, JJJJ-MM-DD)"",""(2025-04-10, JJJJ-MM-DD)"",""reguliereMeting"",""volledigBeoordeeld"",""id_OP_95135""
,,,,,

#header 4 and values
""tijdstip meting"",""waterstand"",""status kwaliteitscontrole"",""censuurreden"",""censuurlimietwaarde"",""interpolatietype""
2024-07-18T12:00:00+02:00"",""30.602"",""goedgekeurd"",,,""discontinu""

…

,,,,,

and then header 3 & 4 (are related to each other as well) repeat throughout the file.
How can I extract all these headers and their values into chunks in FME.
a single CSV with one header is no issue for me to get data out of. maybe i am missing something so obvious?
Please guide me
Linked a sample CSV for better understanding.

Best answer by takashi

Oh, this workflow is better.

This post is closed to further activity.
It may be an old question, an answered question, an implemented idea, or a notification-only post.
Please check post dates before relying on any information in a question or answer.
For follow-up or related questions, please post a new question or idea.
If there is a genuine update to be made, please contact us and request that the post is reopened.

mramezani
Participant
Forum|Forum|7 months ago
August 4, 2025

first we need to read the CSV as Text File;
then with counter we assign a unique id to each row;
then we distinquish the header row from non-header row by defining a attribute named "isHeader” and using REGEX
we use python caller to assign a block-id to each group of line following a header
afterwards you can do separate rows based on the block-id and do whatever you intend to do
python caller:

import fme
import fmeobjects

class FeatureProcessor(object):
"""Template Class Interface:
When using this class, make sure its name is set as the value of the 'Class to Process Features'
transformer parameter.
"""

def __init__(self):
self.block_id=0
"""Base constructor for class members."""
pass

def has_support_for(self, support_type: int):
"""This method is called by FME to determine if the PythonCaller supports Bulk mode,
which allows for significant performance gains when processing large numbers of features.
Bulk mode cannot always be supported.
More information available in transformer help.
"""
return support_type == fmeobjects.FME_SUPPORT_FEATURE_TABLE_SHIM

def input(self, feature: fmeobjects.FMEFeature):
is_header=feature.getAttribute('isHeader')
cnt=feature.getAttribute('_count')
if is_header=='Yes':
self.block_id+=1

feature.setAttribute('block_id', self.block_id)
self.pyoutput(feature)

def close(self):
"""This method is called once all the FME Features have been processed from input()."""
pass

def process_group(self):
"""This method is called by FME for each group when group processing mode is enabled.
This implementation should reset any instance variables used for the next group.
Bulk mode should not be enabled when using group processing.
More information available in transformer help.
"""
pass

Upvote

+62

geomancer
Evangelist
Forum|Forum|7 months ago
August 4, 2025

Nice idea. You can also easily do this without python.

Read the CSV as Text File
Define the attribute "isHeader” as @mramezani suggests, and an attribute “GroupID".
Use an AttributeManager with Adjacent Feature Attributes enabled, looking at 1 prior feature. Set the default value for substitution to 0.
Set a Conditional Value in "GroupID”. When “isHeader"="Yes”, add 1 to the value of "GroupID” of the prior feature. When “isHeader"="No”, use the value of "GroupID” of the prior feature.

I'm not at a computer running FME at the moment, so you will have to look up the details yourself.

Upvote

goldstein
Author
Contributor
Forum|Forum|7 months ago
August 4, 2025

first we need to read the CSV as Text File;
then with counter we assign a unique id to each row;
then we distinquish the header row from non-header row by defining a attribute named "isHeader” and using REGEX
we use python caller to assign a block-id to each group of line following a header
afterwards you can do separate rows based on the block-id and do whatever you intend to do
python caller:

import fme
import fmeobjects

class FeatureProcessor(object):
"""Template Class Interface:
When using this class, make sure its name is set as the value of the 'Class to Process Features'
transformer parameter.
"""

def __init__(self):
self.block_id=0
"""Base constructor for class members."""
pass

def has_support_for(self, support_type: int):
"""This method is called by FME to determine if the PythonCaller supports Bulk mode,
which allows for significant performance gains when processing large numbers of features.
Bulk mode cannot always be supported.
More information available in transformer help.
"""
return support_type == fmeobjects.FME_SUPPORT_FEATURE_TABLE_SHIM

def input(self, feature: fmeobjects.FMEFeature):
is_header=feature.getAttribute('isHeader')
cnt=feature.getAttribute('_count')
if is_header=='Yes':
self.block_id+=1

feature.setAttribute('block_id', self.block_id)
self.pyoutput(feature)

def close(self):
"""This method is called once all the FME Features have been processed from input()."""
pass

def process_group(self):
"""This method is called by FME for each group when group processing mode is enabled.
This implementation should reset any instance variables used for the next group.
Bulk mode should not be enabled when using group processing.
More information available in transformer help.
"""
pass

Thanks @mramezani it does give me a good direction to solve this issue.

Upvote

goldstein
Author
Contributor
Forum|Forum|7 months ago
August 4, 2025

Nice idea. You can also easily do this without python.

Read the CSV as Text File
Define the attribute "isHeader” as @mramezani suggests, and an attribute “GroupID".
Use an AttributeManager with Adjacent Feature Attributes enabled, looking at 1 prior feature. Set the default value for substitution to 0.
Set a Conditional Value in "GroupID”. When “isHeader"="Yes”, add 1 to the value of "GroupID” of the prior feature. When “isHeader"="No”, use the value of "GroupID” of the prior feature.

I'm not at a computer running FME at the moment, so you will have to look up the details yourself.

@geomancer I will check this as well, for now @mramezani method does solve it.

Upvote

takashi
Forum|Forum|6 months ago
August 5, 2025

Hi @goldstein ,

Another way, you can also use a global variable to define "Block ID" for each header row and propagate it to subsequent data rows. This is an old work flow design before the Enable Adjacent Feature Attributes option have been introduced into AttributeCreator/AttributeManager, but still effective in some cases, I think.