Skip to main content
Solved

Dealing with Excel where attribute schema is consistent but number of 'read me' header rows is not


p_c_20
Contributor
Forum|alt.badge.img+8

I have a lot of excel spreadsheets that all have the same attribute schema. However, there are a varying number of header lines that precede the data in each. 

In FeatureReaders & Readers, we define the start point for data by line number. Given the consistency of the attribute schema and the inconsistency of the header spiel, I’m trying to think of a way to only read in the ssheets from the start of the data/attribute names, i.e. dynamically, defined by the first attribute name rather than the line number (I guess). 

Any ideas?

Best answer by geomancer

You could do something like this:

  1. When reading the Excel file, under Parameters set Field Names Row and Cell Range to empty. FME will name your columns A, B, etc. Also expose fme_basename
  2. Add a line number using a Counter
  3. Filter out the feature where column A starts with your fme_basename
  4. Add an attribute to this feature, set it to the line number, remove the other attributes
  5. Merge this feaure to all existing features
  6. Select only the features where the line number is greater then or equal to the line number from step 4

 

View original
Did this help you find an answer to your question?

7 replies

jkr_wrk
Influencer
Forum|alt.badge.img+28
  • June 19, 2024

You could use a generic featurereader to read the excel files. Then test for the first feature that has your header title in attribute A. And then use the xlsx_row_id to as setting for header in your featurereader.

Or…

If your column names are fixed just rename all letters to the correct names. Or figure out how to do this dynamic based on the feature with the headers, merging this to all other features and renaming according.


p_c_20
Contributor
Forum|alt.badge.img+8
  • Author
  • Contributor
  • June 20, 2024

Not sure I was clear and coherent in my explanation. There’s a whole load of non-attribute text at the top of the spreadsheets, and this (for some reason - not sure what) varies in length. Thus, the start line for the attribute names varies. Could be line 19, 20, 21 etc, etc. I’ve dummied up the structure here: 

Field name A, Field name B etc are consistent from file to file. What isn’t consistent is the number of lines of ‘blah’ that precede the field name row. It also handily has a line between the attribute names and the data. 😕

Another peculiarity of the software creating this output is that Field A only has one cell populated - the first -and this value is the same as the FME basename for the file, albeit with a suffix. This occurs nowhere else in the spreadsheet. So thinking about it, that is what I’d want the reader to look for as the start of the data. (I’m going to overwrite all Field Name A values with the suffixed basename later on.) 


 


jkr_wrk
Influencer
Forum|alt.badge.img+28
  • June 20, 2024

Did you try the Generic Reader / FeatureReader with Generic format?


p_c_20
Contributor
Forum|alt.badge.img+8
  • Author
  • Contributor
  • June 20, 2024
jkr_wrk wrote:

Did you try the Generic Reader / FeatureReader with Generic format?

I did. Problem is that in A1, you have a value that is file-specific (includes a datetime). So you get this: 

 

...so yes, I can find the FME basename feature using a tester, but FME won’t let me enter ‘A’ - I have to use the text value FME believes to be the column name. 

Any way around this? 


geomancer
Evangelist
Forum|alt.badge.img+46
  • Evangelist
  • Best Answer
  • June 20, 2024

You could do something like this:

  1. When reading the Excel file, under Parameters set Field Names Row and Cell Range to empty. FME will name your columns A, B, etc. Also expose fme_basename
  2. Add a line number using a Counter
  3. Filter out the feature where column A starts with your fme_basename
  4. Add an attribute to this feature, set it to the line number, remove the other attributes
  5. Merge this feaure to all existing features
  6. Select only the features where the line number is greater then or equal to the line number from step 4

 


p_c_20
Contributor
Forum|alt.badge.img+8
  • Author
  • Contributor
  • June 27, 2024

This worked very well. Just needed to set up group processing in the Counter. 

(Was briefly confused as the FirstDataLine values weren’t matching the spreadsheets, but it was because FME is ignoring blank rows. )

Much obliged


geomancer
Evangelist
Forum|alt.badge.img+46
  • Evangelist
  • June 28, 2024

You're welcome, thanks for the feedback.

I'm glad to know it works.


Reply


Cookie policy

We use cookies to enhance and personalize your experience. If you accept you agree to our full cookie policy. Learn more about our cookies.

 
Cookie settings