Solved

Dealing with Excel where attribute schema is consistent but number of 'read me' header rows is not

1 year ago
June 19, 2024
7 replies
84 views

p_c_20
Supporter
32 replies

I have a lot of excel spreadsheets that all have the same attribute schema. However, there are a varying number of header lines that precede the data in each.

In FeatureReaders & Readers, we define the start point for data by line number. Given the consistency of the attribute schema and the inconsistency of the header spiel, I’m trying to think of a way to only read in the ssheets from the start of the data/attribute names, i.e. dynamically, defined by the first attribute name rather than the line number (I guess).

Any ideas?

Best answer by geomancer

You could do something like this:

When reading the Excel file, under Parameters set Field Names Row and Cell Range to empty. FME will name your columns A, B, etc. Also expose fme_basename
Add a line number using a Counter
Filter out the feature where column A starts with your fme_basename
Add an attribute to this feature, set it to the line number, remove the other attributes
Merge this feaure to all existing features
Select only the features where the line number is greater then or equal to the line number from step 4

View original

Did this help you find an answer to your question?

+29

jkr_wrk
387 replies
1 year ago
June 19, 2024

You could use a generic featurereader to read the excel files. Then test for the first feature that has your header title in attribute A. And then use the xlsx_row_id to as setting for header in your featurereader.

Or…

If your column names are fixed just rename all letters to the correct names. Or figure out how to do this dynamic based on the feature with the headers, merging this to all other features and renaming according.

💡 Did you know... The FeatureWriter is more intuitive. 😏

p_c_20
Author
Supporter
32 replies
1 year ago
June 20, 2024

Not sure I was clear and coherent in my explanation. There’s a whole load of non-attribute text at the top of the spreadsheets, and this (for some reason - not sure what) varies in length. Thus, the start line for the attribute names varies. Could be line 19, 20, 21 etc, etc. I’ve dummied up the structure here:

Field name A, Field name B etc are consistent from file to file. What isn’t consistent is the number of lines of ‘blah’ that precede the field name row. It also handily has a line between the attribute names and the data. 😕

Another peculiarity of the software creating this output is that Field A only has one cell populated - the first -and this value is the same as the FME basename for the file, albeit with a suffix. This occurs nowhere else in the spreadsheet. So thinking about it, that is what I’d want the reader to look for as the start of the data. (I’m going to overwrite all Field Name A values with the suffixed basename later on.)

+29

jkr_wrk
387 replies
1 year ago
June 20, 2024

Did you try the Generic Reader / FeatureReader with Generic format?

💡 Did you know... The FeatureWriter is more intuitive. 😏

p_c_20
Author
Supporter
32 replies
1 year ago
June 20, 2024

jkr_wrk wrote:

Did you try the Generic Reader / FeatureReader with Generic format?

I did. Problem is that in A1, you have a value that is file-specific (includes a datetime). So you get this:

...so yes, I can find the FME basename feature using a tester, but FME won’t let me enter ‘A’ - I have to use the text value FME believes to be the column name.

Any way around this?

+51

geomancer
Evangelist
899 replies
Best Answer
1 year ago
June 20, 2024

You could do something like this:

When reading the Excel file, under Parameters set Field Names Row and Cell Range to empty. FME will name your columns A, B, etc. Also expose fme_basename
Add a line number using a Counter
Filter out the feature where column A starts with your fme_basename
Add an attribute to this feature, set it to the line number, remove the other attributes
Merge this feaure to all existing features
Select only the features where the line number is greater then or equal to the line number from step 4

p_c_20
Author
Supporter
32 replies
1 year ago
June 27, 2024

This worked very well. Just needed to set up group processing in the Counter.

(Was briefly confused as the FirstDataLine values weren’t matching the spreadsheets, but it was because FME is ignoring blank rows. )

Much obliged

+51

geomancer
Evangelist
899 replies
1 year ago
June 28, 2024

You're welcome, thanks for the feedback.

I'm glad to know it works.

Reply

Rich Text Editor, editor1

Cookie policy

We use cookies to enhance and personalize your experience. If you accept you agree to our full cookie policy. Learn more about our cookies.

Cookie settings

We use 3 different kinds of cookies. You can choose which cookies you want to accept. We need basic cookies to make this site work, therefore these are the minimum you can select. Learn more about our cookies.

Basic
Functional

Normal
Functional + analytics

Complete
Functional + analytics + social media + embedded videos + marketing

Dealing with Excel where attribute schema is consistent but number of 'read me' header rows is not