Skip to main content

Hi everyone,

I’m hopeful somebody here will have seen this behaviour before and can help me to understand what’s going on. I’m getting an error when trying to decode some Base64 documents that have been read out of a system but I’m not sure why and I’m not sure I can easily create a sample workflow to demonstrate it, as I’ll explain.

The task is to extract specific documents form an API connected to a document management system. I’ve been able to create a workflow that will identify the required document IDs and then extract the documents into feature attributes for me.

The document data that is returned to me is in Base64 format. When I post the output data into an online conversion as a test, it is converted into a usable image (in the case that I tried). From this I assume that all other documents will work similarly.

When I pipe that output into a BinaryDecoder and ask it to decode the Base64 attribute, it throws an error complaining about the input.

When I copy the Base64 code and add it into an AttributeCreator and pipe that into a BinaryDecoder, the process works and generates for me an output that I believe can then be saved to disc as the document in question.

When I open the Base64 fields to examine them, the version that works has obvious LF characters at the end of each line of text. The version that fails (but that was the source for the working example) does not show these characters in the editor.

Please let me know whether I’ve explained this clearly enough. I will try to provide screen shots to illustrate what I’m trying to describe.

Has anyone seen behaviour like this with the BinaryDecoder before and, if so, how did you resolve it?

Many thanks for any thoughts you have that help me understand where this is going wrong.

 

Hello @simeon,

Which version of FME are you using?

If you are using a 2024 build that is older than build 24188, there is a bug in the BinaryDecoder affecting decoding of Base64 data in attributes that are not encoded as text. The fix for this issue is included in FME 2024.0.1 and newer.

You can test if this is the same bug by using an AttributeEncoder (destination encoding: Unicode 8-bit, incoming attribute: use bytes, replace invalid characters: no) on the attribute(s) containing Base64 data before the BinaryDecoder.


I, at one time had an API that did not end the Base64 encoding with the correct amount of ==. These where omitted. I think I somehow calculated if the length of the Base64 string was the right size and then put the correct amount of Base64 filler characters behind the string before sending it to the Decoder.

I think it would be very strange your working Base64 string contains newline/linefeed characters. I would expect that would break things and not fix.

 

Edit: Some explanation about the issue I encountered:

https://stackoverflow.com/questions/6916805/why-does-a-base64-encoded-string-have-an-sign-at-the-end

 

As mentioned there: Some software will fill in to a multiple of 3 characters by itself and FME did not, and maybe still does not solve this automagical. So I did a stringlengthCalculator and fmod() to see how many equal signs I had to add.

 


Thanks for the suggestions, I’ve managed to make some progress on this today, so I thought I’d better outline what I’ve done in case it’s of use to anyone else going through the same pain.

@debbiatsafe - Sadly we’re still at 2022.1.1.0. The problem you described may be present there but I couldn’t be certain. When I tried using the AttributeEncode as suggested, it merely rendered the first character of the base64 string into the output.

@jkr_wrk - The business with the ‘LF’ characters was just describing what I could see in the FME text editor after I copied the base64 field and pasted it into there as a sample to test the decoder. I’ve no idea whether it was making any difference or, if it did, why.

What worked for me?

I’d had a similar problem earlier in my processing, trying to extract the list of documents associated with a ListExploder (see

for details)

I decided to go back and investigate similar methods of extracting the data from my XML field.

I used 2 XMLFragmenters - one to extract the name of the file extension and a second one to extract the base64 information. This second fragmenter gave me the document details as an XML fragment.

I then used the XMLXQueryExtractor on this to extract the base64 information into a separate text field - rather than relying on the answer provided by the fragmenter. This was then sent to the BinaryDecoder where it was decoded into a form that could be written to disk as the required file and then checked against the original system.

(See

for some of the info I used when figuring out how to do this. There are other similarly helpful threads here that show up with a little searching.)

I’ve now got a workflow that appears to handle any of the documents that have been uploaded into our system and can extract them into a folder structure of our own choosing on request.

Many thanks again for the responses, they helped guide me in the right sort of direction, even if they couldn’t provide a specific answer to the problem faced. At least they showed that there could be a different way of approaching the problem.


Reply