I have a lot of excel spreadsheets that all have the same attribute schema. However, there are a varying number of header lines that precede the data in each.
In FeatureReaders & Readers, we define the start point for data by line number. Given the consistency of the attribute schema and the inconsistency of the header spiel, I’m trying to think of a way to only read in the ssheets from the start of the data/attribute names, i.e. dynamically, defined by the first attribute name rather than the line number (I guess).
Any ideas?
You could use a generic featurereader to read the excel files. Then test for the first feature that has your header title in attribute A. And then use the xlsx_row_id to as setting for header in your featurereader.
Or…
If your column names are fixed just rename all letters to the correct names. Or figure out how to do this dynamic based on the feature with the headers, merging this to all other features and renaming according.
Not sure I was clear and coherent in my explanation. There’s a whole load of non-attribute text at the top of the spreadsheets, and this (for some reason - not sure what) varies in length. Thus, the start line for the attribute names varies. Could be line 19, 20, 21 etc, etc. I’ve dummied up the structure here:
Field name A, Field name B etc are consistent from file to file. What isn’t consistent is the number of lines of ‘blah’ that precede the field name row. It also handily has a line between the attribute names and the data.
Another peculiarity of the software creating this output is that Field A only has one cell populated - the first -and this value is the same as the FME basename for the file, albeit with a suffix. This occurs nowhere else in the spreadsheet. So thinking about it, that is what I’d want the reader to look for as the start of the data. (I’m going to overwrite all Field Name A values with the suffixed basename later on.)
Did you try the Generic Reader / FeatureReader with Generic format?
Did you try the Generic Reader / FeatureReader with Generic format?
I did. Problem is that in A1, you have a value that is file-specific (includes a datetime). So you get this:
...so yes, I can find the FME basename feature using a tester, but FME won’t let me enter ‘A’ - I have to use the text value FME believes to be the column name.
Any way around this?
You could do something like this:
- When reading the Excel file, under Parameters set Field Names Row and Cell Range to empty. FME will name your columns A, B, etc. Also expose fme_basename
- Add a line number using a Counter
- Filter out the feature where column A starts with your fme_basename
- Add an attribute to this feature, set it to the line number, remove the other attributes
- Merge this feaure to all existing features
- Select only the features where the line number is greater then or equal to the line number from step 4
This worked very well. Just needed to set up group processing in the Counter.
(Was briefly confused as the FirstDataLine values weren’t matching the spreadsheets, but it was because FME is ignoring blank rows. )
Much obliged
You're welcome, thanks for the feedback.
I'm glad to know it works.