Skip to main content
Solved

How to combine multiple rows of one attribute into two attributes based on Excel indention

  • September 14, 2020
  • 8 replies
  • 117 views

jeroen
Contributor
Forum|alt.badge.img+8

I'm having some difficulties with an translation of an Excel file that contains a some sort of layered structure and needs to be transformed to a sort of folder/path attribute structure in Excel. The input is as followed:

Input

The first problem I encounter is the use of indentation in Excel, it looks like FME does nog recognize this kind of setting/formatting.

indention

Input_inspector

This issue makes it a lot harder to transform the layered structure of attribute "Naam" to a path structure with an "folder" and "name" attribute. As shown below:

Output I already have a solution separating the steps and merging them back when the structure is OK. I can do this by splitting the steps using the Sampler with a Group By number with sample rate (N) 1 and sample type first N with no randomization. All steps will be in the NotSampled area and can be merges back later based on the number getting the structure from the Sampled part of the flow.

 

So is there a way to use the indention or anther way to merge these rows into path and name attributes shown in the output example?

Best answer by jeroen

It was in the end far easier than expected, it can be added with the initial writer options. Just a checkbox and than a tester and string splitter got me the correct number😃

6e36e0f8-19a0-442c-b331-baec32f08801

View original
Did this help you find an answer to your question?

8 replies

ebygomm
Influencer
Forum|alt.badge.img+32
  • Influencer
  • September 14, 2020

So you are trying to do the reverse of what was happening here?

https://community.safe.com/s/question/0D54Q000080hc7jSAA/translating-excel-table-structure-to-layered-kind-of-structure-in-excel

If you just need to get to a stage where you have the indents and can work from there, you can just use adjacent attribute handling to work out the indents and then a stringpadder to get something like this

Capture

Although not 100% sure this will work for all your data, although it works for the sample


jeroen
Contributor
Forum|alt.badge.img+8
  • Author
  • Contributor
  • September 15, 2020
ebygomm wrote:

So you are trying to do the reverse of what was happening here?

https://community.safe.com/s/question/0D54Q000080hc7jSAA/translating-excel-table-structure-to-layered-kind-of-structure-in-excel

If you just need to get to a stage where you have the indents and can work from there, you can just use adjacent attribute handling to work out the indents and then a stringpadder to get something like this

Capture

Although not 100% sure this will work for all your data, although it works for the sample

Yes correct. But i can not find out how to show the indents. How did you manage to get the indent attribute there?


ebygomm
Influencer
Forum|alt.badge.img+32
  • Influencer
  • September 15, 2020
jeroen wrote:

Yes correct. But i can not find out how to show the indents. How did you manage to get the indent attribute there?

I used an attributecreator to work out the indents based on adjacent row types and then reinserted them. But i don't think this is the solution, it works for your sample data, but it wouldn't work if once you get onto Sub folder 2.

 

In the past when I've needed to read information in an excel file that is not brought through into FME I've had to use a process to unzip the excel file and read the information from the relevant xml file. The indents are show in the style.xml file, but it looks quite complex to match them up with the data.


jeroen
Contributor
Forum|alt.badge.img+8
  • Author
  • Contributor
  • September 15, 2020

I found the information in the Excel file with a bit of tweaking. The excel file can be turned into a zip by renaming it and then unpacked to show the real xml information of the document. In the subfolder "xl\worksheets" is a document named "sheet1.xml" that contains the values of the indentation in the following format. If i could merge this somehow it would be possible to get the structure of the B row back.

 

As a example is value from cell B15 below. It has a value 15 (linking it to the word "Task 1"), and s="4" is the indention. If there is a way to extract the B line and connect them to excel it could be workable.

      <c r="B15" s="4" t="s">
        <v>15</v>

The whole file:

 


ebygomm
Influencer
Forum|alt.badge.img+32
  • Influencer
  • September 15, 2020
jeroen wrote:

I found the information in the Excel file with a bit of tweaking. The excel file can be turned into a zip by renaming it and then unpacked to show the real xml information of the document. In the subfolder "xl\worksheets" is a document named "sheet1.xml" that contains the values of the indentation in the following format. If i could merge this somehow it would be possible to get the structure of the B row back.

 

As a example is value from cell B15 below. It has a value 15 (linking it to the word "Task 1"), and s="4" is the indention. If there is a way to extract the B line and connect them to excel it could be workable.

      <c r="B15" s="4" t="s">
        <v>15</v>

The whole file:

 

If you read the xml with an xml reader with elements to match as worksheet/sheetData/row/c you should get some rows of data, you're only interested in the cells in column B so test that c.r begins with B, c.s contains the indent, use a stringsearcher and some regex [0-9]+ to extract the number from this cell which is the row number, you can then merge this onto the excel data by the rownumber, and use a stringpadder to apply the indents

Capture


jeroen
Contributor
Forum|alt.badge.img+8
  • Author
  • Contributor
  • September 15, 2020
ebygomm wrote:

If you read the xml with an xml reader with elements to match as worksheet/sheetData/row/c you should get some rows of data, you're only interested in the cells in column B so test that c.r begins with B, c.s contains the indent, use a stringsearcher and some regex [0-9]+ to extract the number from this cell which is the row number, you can then merge this onto the excel data by the rownumber, and use a stringpadder to apply the indents

Capture

Thanks for the help, you need a extra step to finish it. First you need to join the styles (left join c.s with _count) with the worksheet and than join the result (only the number of c.r with xlsx_row_id) with your Excel file. The end result looks like this:Add indent


jeroen
Contributor
Forum|alt.badge.img+8
  • Author
  • Contributor
  • Best Answer
  • September 16, 2020

It was in the end far easier than expected, it can be added with the initial writer options. Just a checkbox and than a tester and string splitter got me the correct number😃

6e36e0f8-19a0-442c-b331-baec32f08801


ebygomm
Influencer
Forum|alt.badge.img+32
  • Influencer
  • September 16, 2020
jeroen wrote:

It was in the end far easier than expected, it can be added with the initial writer options. Just a checkbox and than a tester and string splitter got me the correct number😃

6e36e0f8-19a0-442c-b331-baec32f08801

Good stuff, I'd actually looked at the formatting options in the reader, but couldn't see the indent data. I don't know how I missed it, as it seems quite obvious when I've checked again


Cookie policy

We use cookies to enhance and personalize your experience. If you accept you agree to our full cookie policy. Learn more about our cookies.

 
Cookie settings