Skip to main content

I'm having some difficulties with an translation of an Excel file that contains a some sort of layered structure and needs to be transformed to a sort of folder/path attribute structure in Excel. The input is as followed:

Input

The first problem I encounter is the use of indentation in Excel, it looks like FME does nog recognize this kind of setting/formatting.

indention

Input_inspector

This issue makes it a lot harder to transform the layered structure of attribute "Naam" to a path structure with an "folder" and "name" attribute. As shown below:

Output I already have a solution separating the steps and merging them back when the structure is OK. I can do this by splitting the steps using the Sampler with a Group By number with sample rate (N) 1 and sample type first N with no randomization. All steps will be in the NotSampled area and can be merges back later based on the number getting the structure from the Sampled part of the flow.

 

So is there a way to use the indention or anther way to merge these rows into path and name attributes shown in the output example?

So you are trying to do the reverse of what was happening here?

https://community.safe.com/s/question/0D54Q000080hc7jSAA/translating-excel-table-structure-to-layered-kind-of-structure-in-excel

If you just need to get to a stage where you have the indents and can work from there, you can just use adjacent attribute handling to work out the indents and then a stringpadder to get something like this

Capture

Although not 100% sure this will work for all your data, although it works for the sample


So you are trying to do the reverse of what was happening here?

https://community.safe.com/s/question/0D54Q000080hc7jSAA/translating-excel-table-structure-to-layered-kind-of-structure-in-excel

If you just need to get to a stage where you have the indents and can work from there, you can just use adjacent attribute handling to work out the indents and then a stringpadder to get something like this

Capture

Although not 100% sure this will work for all your data, although it works for the sample

Yes correct. But i can not find out how to show the indents. How did you manage to get the indent attribute there?


Yes correct. But i can not find out how to show the indents. How did you manage to get the indent attribute there?

I used an attributecreator to work out the indents based on adjacent row types and then reinserted them. But i don't think this is the solution, it works for your sample data, but it wouldn't work if once you get onto Sub folder 2.

 

In the past when I've needed to read information in an excel file that is not brought through into FME I've had to use a process to unzip the excel file and read the information from the relevant xml file. The indents are show in the style.xml file, but it looks quite complex to match them up with the data.


I found the information in the Excel file with a bit of tweaking. The excel file can be turned into a zip by renaming it and then unpacked to show the real xml information of the document. In the subfolder "xl\worksheets" is a document named "sheet1.xml" that contains the values of the indentation in the following format. If i could merge this somehow it would be possible to get the structure of the B row back.

 

As a example is value from cell B15 below. It has a value 15 (linking it to the word "Task 1"), and s="4" is the indention. If there is a way to extract the B line and connect them to excel it could be workable.

      <c r="B15" s="4" t="s">
        <v>15</v>

The whole file:

 


I found the information in the Excel file with a bit of tweaking. The excel file can be turned into a zip by renaming it and then unpacked to show the real xml information of the document. In the subfolder "xl\worksheets" is a document named "sheet1.xml" that contains the values of the indentation in the following format. If i could merge this somehow it would be possible to get the structure of the B row back.

 

As a example is value from cell B15 below. It has a value 15 (linking it to the word "Task 1"), and s="4" is the indention. If there is a way to extract the B line and connect them to excel it could be workable.

      <c r="B15" s="4" t="s">
        <v>15</v>

The whole file:

 

If you read the xml with an xml reader with elements to match as worksheet/sheetData/row/c you should get some rows of data, you're only interested in the cells in column B so test that c.r begins with B, c.s contains the indent, use a stringsearcher and some regex m0-9]+ to extract the number from this cell which is the row number, you can then merge this onto the excel data by the rownumber, and use a stringpadder to apply the indents

Capture


If you read the xml with an xml reader with elements to match as worksheet/sheetData/row/c you should get some rows of data, you're only interested in the cells in column B so test that c.r begins with B, c.s contains the indent, use a stringsearcher and some regex [0-9]+ to extract the number from this cell which is the row number, you can then merge this onto the excel data by the rownumber, and use a stringpadder to apply the indents

Capture

Thanks for the help, you need a extra step to finish it. First you need to join the styles (left join c.s with _count) with the worksheet and than join the result (only the number of c.r with xlsx_row_id) with your Excel file. The end result looks like this:Add indent


It was in the end far easier than expected, it can be added with the initial writer options. Just a checkbox and than a tester and string splitter got me the correct number😃

6e36e0f8-19a0-442c-b331-baec32f08801


It was in the end far easier than expected, it can be added with the initial writer options. Just a checkbox and than a tester and string splitter got me the correct number😃

6e36e0f8-19a0-442c-b331-baec32f08801

Good stuff, I'd actually looked at the formatting options in the reader, but couldn't see the indent data. I don't know how I missed it, as it seems quite obvious when I've checked again


Reply