Question

Dynamic workflow excel reader trying to read first row as headers

  • 8 November 2018
  • 2 replies
  • 34 views

Badge +1

Hi,

I am trying to read a number of excel files. Some of these files have a sheet called content. In cell A1 there is a name which is different for each file with the content sheet. On the second row there are attribute headers, and from the third row each row contains data.

The problem I have is that the first file I read from a directory does not have the content sheet. So I can not specify start row 0 when reading attributes. The result I now get is that the first file with a content sheet is read but the first attribute has the name of the value in cell A1. Every file coming after this tries to read the same attribute name as the first one which is missing in this next file.

Does somebody know a solution other than removing the first row from content sheet.


2 replies

Userlevel 4
Badge +25

I think there are two issues here.

Firstly when you create the workspace, you select all files (say file1, file2, file3). Because you select file1 first, FME doesn't show in the parameters any way to set the column info for sheet "content".

The answer to that is, when you create the workspace, to simply to select a file that has the "content" sheet. Then it becomes available to set up. When you run the workspace then you can pick all of the files to process.

The second problem is that some content sheets have this extra line. I'm looking into this. It's very strange but it seems to work fine for me... but messes up the other sheets.

The more I think about this, the more I think this is going to be impossible to handle. The only solutions would be to either remove that first row as you suggest, or to set up named ranges where the range excludes that first row where applicable.

Userlevel 4
Badge +25

So... I've been investigating this a bit more. And even as I'm writing this I'm still trying to figure out if what FME does is correct.

Anyway, I think the problem is because your translation is dynamic. When you read an Excel spreadsheet in a dynamic way, the first sheet defines the line on which attribute names are read; even when it's different sheets in the same file.

For example, say I have File 1: Sheet 1 (Attr Names on Line 2), and Sheet 2 (Attr Names on Line 1)

With dynamic reading, FME reads sheet 1 correctly, but for sheet 2 looks for the attribute names on Line 2, and so reads the data wrong. I'm not altogether certain whether this is correct or not, but I'll investigate.

Still, what I wonder in your scenario is whether dynamic reading is necessary at all. I don't see anything in your question as to why you couldn't make the reader feature types static. As long as you know in advance what the sheet names are, then there is no problem. It's only when you might suddenly find an Excel file has an extra, unexpected sheet, that there is a problem.

In static mode you wouldn't get the same problems because each sheet would keep its own definition, and you could even have a different reader for files with the extra row on the Content sheet.

You might want the writer to be dynamic, but that's OK. Having a static reader doesn't mean you can't have a dynamic writer. All a dynamic reader does is let feature types pass regardless of whether they existed or not in the workspace. You can still do that by making one of your feature types dynamic; sort of a catch all. You'd do that by clicking the Merge Filter option in a single feature type dialog. But as long as the rest of the tables - including Content - had their own static type, all should be OK I think.

So maybe give that a try and let us know how it works out.

Reply