Question

retrieve data from form xls but not columnar

  • 10 January 2019
  • 7 replies
  • 13 views

Badge

I have a standardized xls form that does not use columns. It is designed as a field form and has all cells locked except where the field personnel are to put data. This is more like an OCR type of conversion than excel. all the values I need are in specific (sometimes merged) cells. So there is not a column which contains a value.

I am able to retrieve all the data and can see it all in inspector just not format I need.

Each file is a separate event. Each sheet is a separate feature. The data for each feature is always in the same cells. I need to output each feature as a row. I have attached an image showing 1 sheet.


7 replies

Userlevel 2
Badge +16

Using the Excel reader, you have the option not to have Field names at the first row.

In that case the attributes will be listed as col0, col1 etc.

Using the column and xls_row_id attribute (needs to be exposed) you can define the cell.

Then the AttributeManager can be used to get the vales per cell and copy to the right output attributes.

Looking at the SchemaMapper transformer could be useful too (instead of the AttributeManager).

Hope this helps.

Badge

Using the Excel reader, you have the option not to have Field names at the first row.

In that case the attributes will be listed as col0, col1 etc.

Using the column and xls_row_id attribute (needs to be exposed) you can define the cell.

Then the AttributeManager can be used to get the vales per cell and copy to the right output attributes.

Looking at the SchemaMapper transformer could be useful too (instead of the AttributeManager).

Hope this helps.

Thanks, I have exposed the row_id. I have been trying feature merger. I will go with attribute manager and schemamapper to to see if I can GBH.

Badge

Dig file example.xlsx

I have multiple of these. I am using the filename/dataset (which translates from sheet name) to group each feature.

Badge +2

@kidsmake6until2 Thanks for attaching the sample data.

Brute force is the only approach that I can see, but you should be able to extract most of what you need. I've attached a workspace (2018) that illustrates some ideas. All of this assumes that your sheets for SLEEVE (1/2) , COATINGS (1/2) are exactly the same!!!

When you add the Excel reader, and you open the Parameters dialog: If FME doesn't find any useful names in the "Field Names Row" (or you set that to be ,blank>) then it'll default to the Excel column values (A,B,C,... - in older FME versions that was col1, col2 etc). Select two of the sheets, i.e. SLEEVE(2), COATING (2), and un-select the others. OK on the parameters dialog to add the Excel reader.

On each feature type , i.e. SLEEVE(2), COATING (2), open the properties and check Merge Feature Type and set a Merge Filter: SLEEVE* & COATING*. So now all SLEEVE sheets will be read through a single feature type. Also, expose the Format Attribute xlsx_row_id.

Add an AttributeFilter with the xlsx_row_id numbers that are of interest. For each row, add an AttributeCreator and set the attribute name and the attribute value as the column value (A,B,C...). For merged cells, you use the column of the first cell, so IN Date column is B

Aggregator to bring all the rows together into a single feature.

Tedious but doable.

I don't think FME will be able to dig out the values in strip chart as these are in text boxes.

Workspace (2018): Excel Dig File read.fmw

P.S. There is a problem reading your sheet COATINGS(1) which I'm investigating. COATINGS(2) is fine.

 

Badge

@kidsmake6until2 Thanks for attaching the sample data.

Brute force is the only approach that I can see, but you should be able to extract most of what you need. I've attached a workspace (2018) that illustrates some ideas. All of this assumes that your sheets for SLEEVE (1/2) , COATINGS (1/2) are exactly the same!!!

When you add the Excel reader, and you open the Parameters dialog: If FME doesn't find any useful names in the "Field Names Row" (or you set that to be ,blank>) then it'll default to the Excel column values (A,B,C,... - in older FME versions that was col1, col2 etc). Select two of the sheets, i.e. SLEEVE(2), COATING (2), and un-select the others. OK on the parameters dialog to add the Excel reader.

On each feature type , i.e. SLEEVE(2), COATING (2), open the properties and check Merge Feature Type and set a Merge Filter: SLEEVE* & COATING*. So now all SLEEVE sheets will be read through a single feature type. Also, expose the Format Attribute xlsx_row_id.

Add an AttributeFilter with the xlsx_row_id numbers that are of interest. For each row, add an AttributeCreator and set the attribute name and the attribute value as the column value (A,B,C...). For merged cells, you use the column of the first cell, so IN Date column is B

Aggregator to bring all the rows together into a single feature.

Tedious but doable.

I don't think FME will be able to dig out the values in strip chart as these are in text boxes.

Workspace (2018): Excel Dig File read.fmw

P.S. There is a problem reading your sheet COATINGS(1) which I'm investigating. COATINGS(2) is fine.

 

Thank you Mark,

That got me to the output I need. Now I have much more work to do on it. When you sent that I was working on getting the column/row formatted to an attribute that would allow me to reference them as they are in the file so if a form changed in the future I could update. however I couldn't get it to recognize the column attribute and do anything with it. (confusing). I tried both a concatenator and an attribute creator. (see workspace attached) But after thinking about it would I need to add each sheet name (unless I waited until after I had all the data pulled to a feature) to be able to identify each data value? perhaps your solution is the best way.

I am running with it now anyway. Thoughts?xlsxr2none3.fmw

Badge +2

@kidsmake6until2 Thanks for attaching the sample data.

Brute force is the only approach that I can see, but you should be able to extract most of what you need. I've attached a workspace (2018) that illustrates some ideas. All of this assumes that your sheets for SLEEVE (1/2) , COATINGS (1/2) are exactly the same!!!

When you add the Excel reader, and you open the Parameters dialog: If FME doesn't find any useful names in the "Field Names Row" (or you set that to be ,blank>) then it'll default to the Excel column values (A,B,C,... - in older FME versions that was col1, col2 etc). Select two of the sheets, i.e. SLEEVE(2), COATING (2), and un-select the others. OK on the parameters dialog to add the Excel reader.

On each feature type , i.e. SLEEVE(2), COATING (2), open the properties and check Merge Feature Type and set a Merge Filter: SLEEVE* & COATING*. So now all SLEEVE sheets will be read through a single feature type. Also, expose the Format Attribute xlsx_row_id.

Add an AttributeFilter with the xlsx_row_id numbers that are of interest. For each row, add an AttributeCreator and set the attribute name and the attribute value as the column value (A,B,C...). For merged cells, you use the column of the first cell, so IN Date column is B

Aggregator to bring all the rows together into a single feature.

Tedious but doable.

I don't think FME will be able to dig out the values in strip chart as these are in text boxes.

Workspace (2018): Excel Dig File read.fmw

P.S. There is a problem reading your sheet COATINGS(1) which I'm investigating. COATINGS(2) is fine.

 

@kidsmake6until2The reason the COATING (1) sheet is not read is because the sheet name has a leading space - ' COATING (1)' If you remove the leading space the sheet will be read fine.

Badge +4

Mark,

I like to get help on the similar Excel file which includes reading header data and detailed records within the same Excel sheet. Thanks.

Reply