Skip to main content

I’ve got a weird problem where an Excel reader is reading an entire file correctly, except for one column where it is only getting part of the data. 

When I decode the XML of the Excel file (change extension to .zip and expand), I can look at sheet1.xml and I see the following data:

<c s="12" t="inlineStr" r="AK10">
<is>
<r>
<rPr>
<b val="0"></b>
<i val="0"></i>
<strike val="0"></strike>
<u val="none"></u>
<sz val="10"></sz>
<color theme="1"></color>
<rFont val="Courier New"></rFont>
</rPr>
<t xml:space="preserve"> </t>
</r>
<r>
<rPr>
<b val="0"></b>
<i val="0"></i>
<strike val="0"></strike>
<u val="none"></u>
<sz val="10"></sz>
<color rgb="FF333333"></color>
<rFont val="Arial"></rFont>
</rPr>
<t xml:space="preserve">Yes</t>
</r>
</is>
</c>

This looks like two bits of text, one with several spaces formatted as Courier, then the word “Yes” formatted as Arial. 

FME is reading the first part of the text but not the second part and there doesn’t seem to be any way to convince it to extract the whole text!

I am not 100% sure of the source of the data so there is a chance that it wasn’t created by Excel, but it does seem to be a valid XML representation. 

This issue is happening for FME version 2024.2.1.

The data in the Sheet1.xml is just the styling i think, not the data itself. I’ve seen this before when someone has made a custom format that contains what looks like data, however it’s actually part of the formatting

If I click on that cell, I don’t see the value of yes in the highlighted yellow box. If i then read the file in with FME i don’t see that yes value either.

If you need to access, you should be able to see the values you show in the xml by ticking read formatting in the xl reader 

You will then have to extract the information you need from the formatting.


I think that the contents of the cells will often be in SharedStrings.xml but I think they can be in Sheet1.xml instead. The lack of anything in SharedStrings.xml is what made me wonder if the file wasn’t actually created by Excel.

The values are in cell and aren’t just formatting. That weird behaviour is what got me looking at the XML in the first place. FME has generally been pretty reliable so any files that it can’t extract need further investigation.

 

I had already tried checking all the check boxes above (formulas, comments, hyperlinks and formatting). They didn’t bring anything useful back at all. In fact, the formatting said that the cell was using Courier New, when only part of the text is Courier New and the remainer is Arial.

I did have some success using some Python packages (openpyxl and something else). They both extracted the correct data without any issues. I’d rather not be using a Python Caller unless I really have to but I can’t see any other way at this point.


I tried to reproduce but Excel gives an error if I paste your example over my AK10 cell and open in Excel.

Don’t know enough to correctly recreate what you have and agree that it could be made with another application then Excel. You could try to find and replace Yes with something else and then back to Yes to force Excel to use its own standards?

But that requires knowing the error exists and manipulating before reading with FME. So not a reliable long-run solution. If you could find out what application made the Excel it could raise the emergency to fix it by Safe Software.


Thanks for that info. I have tried the same thing by putting the XML into a blank sheet and rezipping. I also got an error.

That prompted me to remove all the actual data from my source spreadsheet and create a custom Excel file that still shows this issue. This just has two columns and two rows. The problematic data is in cell C2.

There may be other issues with this file because I have just yanked a whole lot of XML out of the sheet1.xml file, so this is just an illustration. It does open without errors for me in Excel.

I have heard that the file was perhaps created by something in R but I can’t be sure about that.


I can reproduce your issue in 2021 and 2024. But the issue is gone in 2025.1.0.0

 


I can reproduce your issue in 2021 and 2024. But the issue is gone in 2025.1.0.0

 

Thanks for that. Good to know it’ll be resolved sometime in my future. We’re currently upgrading from 2021 to 2024.2. Getting to version 2025.1 might be a year away. Looks like it might have to be a Python solution for now.