Question

Attribute Header Mapping

  • 9 April 2015
  • 8 replies
  • 61 views

Badge +1
I have a number of large CSV's which dont have headers for the collumns. I then have another CSV with the headers / attribute names. 

 

I want to be able to map the atribute names from the header csv to the incoming CSV's with the data in? 

 

I suspect there is a simple way of doing it but i can't find one. 

 

Any help will be appreciated. 

8 replies

Badge +3
You can use a schema mapper for this.

 

Add the columnnumber from the csv with the values to the header-csv. (if they are in the same order then its less effort)

 

Use this as a schemamap.

 

Use a exposer with the same haeder-csv as source.

 

 

Copy attribute from exposer to writer.

 

 
Userlevel 4
Hi,

 

 

lots of ways to do this. Here are a couple, both assuming that the format of the CSV is constant
  1. Read the CSV file with only the data. The columns will appear with names like "col0", "col1", etc. Then use an AttributeRenamer.
  2. Make a dummy CSV file by concatenating the header and the data file. Create the reader using this dummy file, then point the reader to the data-only file. Just make sure to set "Number of lines to skip" on the reader to 0, since you no longer have the field names on the first row.
David
Badge +3
indeed many.

 

 

read the files as txt files. Expose the format attribute row_number and.

 

Make header file 1st row number.

 

 

Renumber the row-number attribute from headers by row_number = row_number +1

 

Send both files to a sorter, ascending. And output as single file.
Badge +7

I have the same requirement and have been trying to implement some of the suggestions above.

My data has 77 fields so I've been trying to load in the headings from a CSV rather than having to re-type them. I've created 2 field map CSVs, both containing the old and new field names. One has them in rows and the other in columns.

I've tried the Import button on the AttributeManager but it either doesn't do anything (field names are not renamed) or it hangs on "Reading Dataset" after I select the CSV with the field map in. Has anyone used the Import button?

I've also tried the SchemaMapper but this has not been successful either. I've used my field map CSV where there is a column containing the old field names (col0, col1, etc) and a column containing the new field names. The SchemaMapper action is Map Attributes. However when I inspect the output, the attributes are still called col0, col1, col2, etc with a _schemamapperid column at the end. Am I doing something wrong here?

Apart from using the AttributeManager and manually typing the field names in, the only thing I have got to work so far is to read the header file first then the data, then send both to the same Writer to write a new CSV and un-tick Output Field Names on First Line. However this has the disadvantage of having to write a CSV before you can do any more processing.

Userlevel 4

I have the same requirement and have been trying to implement some of the suggestions above.

My data has 77 fields so I've been trying to load in the headings from a CSV rather than having to re-type them. I've created 2 field map CSVs, both containing the old and new field names. One has them in rows and the other in columns.

I've tried the Import button on the AttributeManager but it either doesn't do anything (field names are not renamed) or it hangs on "Reading Dataset" after I select the CSV with the field map in. Has anyone used the Import button?

I've also tried the SchemaMapper but this has not been successful either. I've used my field map CSV where there is a column containing the old field names (col0, col1, etc) and a column containing the new field names. The SchemaMapper action is Map Attributes. However when I inspect the output, the attributes are still called col0, col1, col2, etc with a _schemamapperid column at the end. Am I doing something wrong here?

Apart from using the AttributeManager and manually typing the field names in, the only thing I have got to work so far is to read the header file first then the data, then send both to the same Writer to write a new CSV and un-tick Output Field Names on First Line. However this has the disadvantage of having to write a CSV before you can do any more processing.

If this isn't something you're doing too often, why not just manually concatenate the line containing the field names with the file contatining the data before creating the CSV reader?
Badge +3

I had the same question, which is why I got to this page, and @david_r's second bullet point in the comments gave me the answer. Actually, you don't even need to do a concatenate of your header file and one of your data files:

  1. Create a CSV Reader and point it to your header file.
  2. In the Parameters, specify Field Names Line = 1 and Data Start Line = 2
  3. Select OK in the Parameters window, but not in the Add Reader window. This will have defined your attribute names.
  4. In the Add Reader window, amend the Dataset and point it to your CSV data file collection (e.g. "c:\\myfolder\\*.zip\\**\\*.csv") and then press the OK button.
  5. In the Navigator pane where all your Readers and Writers are listed, open up the Parameters for your CSV Reader. Now set Field Names Line to <not set> (i.e. clear it) and Data Start Line = 1

(Note: I realise that the original post is almost 3 years old, so it may well be that this functionality was not yet available in 2015)

Badge +6

I had the same question, which is why I got to this page, and @david_r's second bullet point in the comments gave me the answer. Actually, you don't even need to do a concatenate of your header file and one of your data files:

  1. Create a CSV Reader and point it to your header file.
  2. In the Parameters, specify Field Names Line = 1 and Data Start Line = 2
  3. Select OK in the Parameters window, but not in the Add Reader window. This will have defined your attribute names.
  4. In the Add Reader window, amend the Dataset and point it to your CSV data file collection (e.g. "c:\\myfolder\\*.zip\\**\\*.csv") and then press the OK button.
  5. In the Navigator pane where all your Readers and Writers are listed, open up the Parameters for your CSV Reader. Now set Field Names Line to <not set> (i.e. clear it) and Data Start Line = 1

(Note: I realise that the original post is almost 3 years old, so it may well be that this functionality was not yet available in 2015)

@arnold_bijlsma THANK YOU for this very helpful step by step!!! This was a life saver for me!

 

 

Since more time has past and I'm working in FME 2019.2 it looks like some things have changed. I'll go ahead and post the modified steps I used for the next poor soul.

Here's what I found to work:

 

 

  1. Create a CSV Reader and point it to your header file.
  2. In the Parameters, specify Field Names Line = 1 and Data Start Line = 2
  3. Select OK in the Parameters window. This will have defined your attribute names.
  4. Select OK in the Add Reader window.

Now that the reader has been added to your workspace:

1. Right click the new reader in the left Navigation pane and select "Edit 'xxxx' Parameters" to change its properties.

2. In the Edit window, change the following parameters:

  • Source CSV File - amend the Dataset and point it to your CSV data file collection (e.g. "c:\\myfolder\\*.zip\\**\\*.csv")
  • Fields Parameter - Change the Field Names Line to <none> (i.e. clear it) and Data Start Line = 1

THANK YOU AGAIN FOR YOUR HELP!

Badge +6

@arnold_bijlsma THANK YOU for this very helpful step by step!!! This was a life saver for me!

 

 

Since more time has past and I'm working in FME 2019.2 it looks like some things have changed. I'll go ahead and post the modified steps I used for the next poor soul.

Here's what I found to work:

 

 

  1. Create a CSV Reader and point it to your header file.
  2. In the Parameters, specify Field Names Line = 1 and Data Start Line = 2
  3. Select OK in the Parameters window. This will have defined your attribute names.
  4. Select OK in the Add Reader window.

Now that the reader has been added to your workspace:

1. Right click the new reader in the left Navigation pane and select "Edit 'xxxx' Parameters" to change its properties.

2. In the Edit window, change the following parameters:

  • Source CSV File - amend the Dataset and point it to your CSV data file collection (e.g. "c:\\myfolder\\*.zip\\**\\*.csv")
  • Fields Parameter - Change the Field Names Line to <none> (i.e. clear it) and Data Start Line = 1

THANK YOU AGAIN FOR YOUR HELP!

Adding a comment on another use case/workflow when working with CSV files that do not have headers.

A project required mapping FCC antenna data, however the FCC data was broken out into dozens of tables. Due to the number of distinct tables involved I found that the above steps were a bit too tedious. I found it useful to take a few steps to prepare the data prior to being read into FME:

  1. I found it more efficient to pull their table header documentation (from their data dictionary) into an Excel file.
  2. In Excel use the TEXTJOIN function to concatenate the headers into a single cell in a comma (or pipe) delineated format.
  3. Copy the delineated headers from Excel and paste them as the first line in the .csv files.

This eliminated a step when setting up the CSV reader in FME.

If anyone is interested in FCC antennae data, it took a bit of leg work to find the information on the FCC website so if anyone has interest, here are some links:

NOTE: FCC databases are downloaded as tables in .DAT files, these are a pipe deliminated file format which FME reads as a "Aircom ENTERPRISE Map Data/ASSET Data" format by default; however this results in an error, the CSV format needs to be manually selected.

Reply