Skip to main content

Hi everyone,

I have a series of regularly produced Excel files, each containing a number of sheets. The number and name of the sheets in each file isn't fixed.

I can read data from all the sheets no problem by keeping the reader parameters blank - but annoyingly for these files the header starts on row 2 and in some cases junk is written into row 1. As such, FME picks up attribute names from row 1, so for one sheet the attribute for column B might end up named"995" and for the next sheet it might be "376", so I end up with a lot of attributes where there should be one - ideally just called "B".

Is there any way to force the Excel reader to always use column letter as the attribute name regardless of how many sheets are in a file and what they are called?

Hi @gtrathen

As mentioned in the Excel reader documentation, setting the Field Names Row value to 0 (or leaving it blank) will set column names to column letters.

If the column Field Names Row value is set to zero (0), then the column names will be set to the Excel column letters A, B, C, etc. This can be useful if there are no field names, or if they are not consistent.

I hope this information helps.


Hi @gtrathen

As mentioned in the Excel reader documentation, setting the Field Names Row value to 0 (or leaving it blank) will set column names to column letters.

If the column Field Names Row value is set to zero (0), then the column names will be set to the Excel column letters A, B, C, etc. This can be useful if there are no field names, or if they are not consistent.

I hope this information helps.

Hi @debbiatsafe thanks for the reply! The field names row is defined per named sheet in the file as far as I can tell - so if I set the value to 0 and run it for a series of sheets it works for the first file (and any successive file which has a sheet named the same thing) but for any other sheets that weren't in the original definition it doesn't apply. So in the image below - where the sheets are CP4, CP5, CP6 in the file used for definition of parameters, the next might be CP1, CP2, CP3 - and for those sheets which weren't covered by the parameters, it reverts to using row 2 for the field names.

Is there a way to tell the reader "whatever sheets you encounter in the file - the Field Names Row will be 0"?

 

 


Hi @debbiatsafe thanks for the reply! The field names row is defined per named sheet in the file as far as I can tell - so if I set the value to 0 and run it for a series of sheets it works for the first file (and any successive file which has a sheet named the same thing) but for any other sheets that weren't in the original definition it doesn't apply. So in the image below - where the sheets are CP4, CP5, CP6 in the file used for definition of parameters, the next might be CP1, CP2, CP3 - and for those sheets which weren't covered by the parameters, it reverts to using row 2 for the field names.

Is there a way to tell the reader "whatever sheets you encounter in the file - the Field Names Row will be 0"?

 

 

Hi @gtrathen

Try setting the Additional File Settings in the Multiple File Parameters section of the Excel reader parameter to 'Use Current Settings'. By default, this parameter is set to 'Use FME Default Settings'.


Hi @gtrathen

Try setting the Additional File Settings in the Multiple File Parameters section of the Excel reader parameter to 'Use Current Settings'. By default, this parameter is set to 'Use FME Default Settings'.

Hi @debbiatsafe, I get the same outcome with that toggled on - if you'll excuse my crude MS paint editing:


Hi @debbiatsafe, I get the same outcome with that toggled on - if you'll excuse my crude MS paint editing:

I wasn't able to replicate the behaviour you are seeing using 2020.0 (build 20200). What version of FME are you using? In addition, are you able to share your workspace and some sample data?


I wasn't able to replicate the behaviour you are seeing using 2020.0 (build 20200). What version of FME are you using? In addition, are you able to share your workspace and some sample data?

Thanks again for taking a look at this.

I'm using 2019.1.3.1 (build 19643). I've attached a stripped down workspace with a creator and two feature readers (dir/file path and then Excel) in the configuration I was using. I've also attached two sample files which are representative of the input files I am dealing with.

For me - the attributes for Sample_1 come out as B,C,D,E,F,G,H but for Sample_2 they come out as B,C,3,4,5,G,9 - taking the numeric values from the first row.

excel.fmwSample_1.xlsxSample_2.xlsx


Thanks again for taking a look at this.

I'm using 2019.1.3.1 (build 19643). I've attached a stripped down workspace with a creator and two feature readers (dir/file path and then Excel) in the configuration I was using. I've also attached two sample files which are representative of the input files I am dealing with.

For me - the attributes for Sample_1 come out as B,C,D,E,F,G,H but for Sample_2 they come out as B,C,3,4,5,G,9 - taking the numeric values from the first row.

excel.fmwSample_1.xlsxSample_2.xlsx

Hi @gtrathen

Thank you for providing the data and workspace. I believe you are seeing different behaviour due to using a FeatureReader--I was using a regular reader.

I haven't been able to get the behaviour of the regular Excel reader within a FeatureReader in regards to the Additional File Settings parameter. I will file a bug report for this issue.

As a workaround, would you consider using the Excel reader instead? I see you're using a PATH FeatureReader to filter for Excel files. This can also be done within the Excel reader itself using the Select Multiple Folders/Files option.


Hi @gtrathen

Thank you for providing the data and workspace. I believe you are seeing different behaviour due to using a FeatureReader--I was using a regular reader.

I haven't been able to get the behaviour of the regular Excel reader within a FeatureReader in regards to the Additional File Settings parameter. I will file a bug report for this issue.

As a workaround, would you consider using the Excel reader instead? I see you're using a PATH FeatureReader to filter for Excel files. This can also be done within the Excel reader itself using the Select Multiple Folders/Files option.

Hi @debbiatsafe - apologies I should have mentioned I was using FeatureReaders in the original question.

Using Excel readers should do the trick if they don't suffer the same issue, but I also found a workaround that is suitable in this specific instance while I was playing around.

Because the numeric attribute names that sneak in are always sequential the attributes can be sorted and then renamed with a PythonCaller modified from another question here.

Resulting in:

The output is slightly misleading as "XL_A" may not actually be column A but it is at least consistent across every file and sheet that the FeatureReader encounters. If the column names could not be sorted then the above could not be guaranteed to work.

Again, thank you very much for all your help!


Hi @debbiatsafe - apologies I should have mentioned I was using FeatureReaders in the original question.

Using Excel readers should do the trick if they don't suffer the same issue, but I also found a workaround that is suitable in this specific instance while I was playing around.

Because the numeric attribute names that sneak in are always sequential the attributes can be sorted and then renamed with a PythonCaller modified from another question here.

Resulting in:

The output is slightly misleading as "XL_A" may not actually be column A but it is at least consistent across every file and sheet that the FeatureReader encounters. If the column names could not be sorted then the above could not be guaranteed to work.

Again, thank you very much for all your help!

thank you @gtrathen for sharing this solution/workaround


Hi @debbiatsafe thanks for the reply! The field names row is defined per named sheet in the file as far as I can tell - so if I set the value to 0 and run it for a series of sheets it works for the first file (and any successive file which has a sheet named the same thing) but for any other sheets that weren't in the original definition it doesn't apply. So in the image below - where the sheets are CP4, CP5, CP6 in the file used for definition of parameters, the next might be CP1, CP2, CP3 - and for those sheets which weren't covered by the parameters, it reverts to using row 2 for the field names.

Is there a way to tell the reader "whatever sheets you encounter in the file - the Field Names Row will be 0"?

 

 

@debbiatsafe​ I'm in a similar position -- is the bug still outstanding? I'm using 2021.1 and am also finding it impossible to read inconsistently-formatted Excels with a FeatureReader without ending up with a lot of junk attribute names.


Hi @debbiatsafe thanks for the reply! The field names row is defined per named sheet in the file as far as I can tell - so if I set the value to 0 and run it for a series of sheets it works for the first file (and any successive file which has a sheet named the same thing) but for any other sheets that weren't in the original definition it doesn't apply. So in the image below - where the sheets are CP4, CP5, CP6 in the file used for definition of parameters, the next might be CP1, CP2, CP3 - and for those sheets which weren't covered by the parameters, it reverts to using row 2 for the field names.

Is there a way to tell the reader "whatever sheets you encounter in the file - the Field Names Row will be 0"?

 

 

Hi @mtaftferguson​ 

Our development team has evaluated the issue and is planning on adding an option that allows you to set whether automatic or manual schemas are generated (similar to CSV). This should hopefully resolve the issue reported.

 

This work hasn't been implemented yet but is on our team's radar.


Hi @debbiatsafe thanks for the reply! The field names row is defined per named sheet in the file as far as I can tell - so if I set the value to 0 and run it for a series of sheets it works for the first file (and any successive file which has a sheet named the same thing) but for any other sheets that weren't in the original definition it doesn't apply. So in the image below - where the sheets are CP4, CP5, CP6 in the file used for definition of parameters, the next might be CP1, CP2, CP3 - and for those sheets which weren't covered by the parameters, it reverts to using row 2 for the field names.

Is there a way to tell the reader "whatever sheets you encounter in the file - the Field Names Row will be 0"?

 

 

Thank you for the update! I was able to adapt the above renaming script to rename the "junk" attributes in my data so that they were at least exposable, which was sufficient in this case but it will be nicer to have that option.


Reply