Question

How to extract PDF attachment from ArcGIS Online and upload to online location with HTTPCaller?

  • 12 November 2020
  • 9 replies
  • 73 views

Badge +8

I am trying to extract a PDF attachment from an ArcGIS Online feature layer and write it to another online program using the HTTPCaller. (I can do so with CSV attachments.) I can see the PDF is being read by FME Desktop (2020.1.2.1 - build 20624); it's in the arcgisonline_attachment{0}.data attribute but I'm not able to successfully write the PDF contents to the other program. Other forum posts indicate I should use the BinaryDecoder first but it won't decode using either option - Binary64 or HEX. I get an "Invalid Base64 character '%'" or "Invalid HEX character '%'" error. I tried using no decoder and writing the contents of the arcgisonline_attachment{0}.data attribute to the output location directly but get the following error: Received HTTP response header: 'HTTP/1.1 400 Bad Request: Invalid .pdf file.'. What do I need to get the contents of the PDF in the proper format so I can upload it?


9 replies

Badge +9

Hi @aaron​ , I did some testing with this today and it did take a fair bit of trial and error to sort out how to get the results from the ArcGIS Online Reader into a useable encoding, but I did manage it. Here's what I did:

  • Used a BinaryEncoder first to convert from the original Binary format to Base64
  • Then, I connected a BinaryDecoder to that to go from Base64 to System Default (fme-system) encoding (I tested on a Mac, so I suspect 'system default' might mean something different on different operating systems)
  • From there, I was able to write the PDF directly to my file system using an AttributeFileWriter

 

I haven't tried using the output from the BinaryDecoder directly in an HTTPCaller though, so I'm not completely sure if that will work. If it doesn't, one option might be to try writing the pdf to a temp folder first, then upload it from there via the HTTPCaller.

 

Hope this helps!

Badge +8

Hi @aaron​ , I did some testing with this today and it did take a fair bit of trial and error to sort out how to get the results from the ArcGIS Online Reader into a useable encoding, but I did manage it. Here's what I did:

  • Used a BinaryEncoder first to convert from the original Binary format to Base64
  • Then, I connected a BinaryDecoder to that to go from Base64 to System Default (fme-system) encoding (I tested on a Mac, so I suspect 'system default' might mean something different on different operating systems)
  • From there, I was able to write the PDF directly to my file system using an AttributeFileWriter

 

I haven't tried using the output from the BinaryDecoder directly in an HTTPCaller though, so I'm not completely sure if that will work. If it doesn't, one option might be to try writing the pdf to a temp folder first, then upload it from there via the HTTPCaller.

 

Hope this helps!

Hi @lauraatsafe​ , thanks for testing. I was able to get it to work without the BinaryDecoder or BinaryEncoder. All I needed to do was set the Target File Encoding parameter to Binary (fme-binary) in the AttributeFileWriter rather than use System Default (fme-system). I also tried your method and that worked too so nice to know there are options. Now on to the HTTPCaller!

 

I think I may have discovered a bug in both the BinaryEncoder and BinaryDecoder. If I go with the default where "Encode to Different Destination Attributes" is checked and enter an attribute name (or go with the default name) in the "Destination Attribute(s)" text box, that attribute name does not appear in the next transformer so I can't select it. Does this happen with you? I am using FME Desktop 64-bit b20624.

Badge +9

Hi @lauraatsafe​ , thanks for testing. I was able to get it to work without the BinaryDecoder or BinaryEncoder. All I needed to do was set the Target File Encoding parameter to Binary (fme-binary) in the AttributeFileWriter rather than use System Default (fme-system). I also tried your method and that worked too so nice to know there are options. Now on to the HTTPCaller!

 

I think I may have discovered a bug in both the BinaryEncoder and BinaryDecoder. If I go with the default where "Encode to Different Destination Attributes" is checked and enter an attribute name (or go with the default name) in the "Destination Attribute(s)" text box, that attribute name does not appear in the next transformer so I can't select it. Does this happen with you? I am using FME Desktop 64-bit b20624.

Hi @aaron​, Ah! I had a feeling I might have overcomplicated things in my solution! The AttributeFileWriter option you used is definitely a more elegant solution for this.

 

As for the bug in the BinaryEncoder/Decoder, I'm seeing the same behaviour in build 20806 (2020.2). I'll report that internally to get it fixed. The destination attribute should be exposed automatically. Thanks for reporting this issue!

 

UPDATE: This issue is being tracked internally as FMEENGINE-59756

Badge +8

Hi @aaron​ , I did some testing with this today and it did take a fair bit of trial and error to sort out how to get the results from the ArcGIS Online Reader into a useable encoding, but I did manage it. Here's what I did:

  • Used a BinaryEncoder first to convert from the original Binary format to Base64
  • Then, I connected a BinaryDecoder to that to go from Base64 to System Default (fme-system) encoding (I tested on a Mac, so I suspect 'system default' might mean something different on different operating systems)
  • From there, I was able to write the PDF directly to my file system using an AttributeFileWriter

 

I haven't tried using the output from the BinaryDecoder directly in an HTTPCaller though, so I'm not completely sure if that will work. If it doesn't, one option might be to try writing the pdf to a temp folder first, then upload it from there via the HTTPCaller.

 

Hope this helps!

Hi @lauraatsafe​ , I have hit a roadblock trying to upload my ArcGIS Online PDF to another online program using the HTTPCaller. I keep getting an "Invalid .pdf file" response. I think the issue lies with the way the HTTPCaller is encoding the content because if I "Upload From File" (saved to laptop first) it works but "Specify Upload Body" (directly from ArcGIS Online) does not. It's worth noting I can save the PDF from ArcGIS Online using the AttributeFileWriter, which lets me set "Target File Encoding" to "Binary (fme-binary)" while the HTTPCaller gives me no such option. Is there any way I can set the encoding to "Binary (fme-binary)" in the HTTPCaller? Or maybe another option? I could, as you mention above, save the file first and then upload it, but it's not an ideal workflow so I'm hoping for a better solution.

Badge +9

Hi @lauraatsafe​ , I have hit a roadblock trying to upload my ArcGIS Online PDF to another online program using the HTTPCaller. I keep getting an "Invalid .pdf file" response. I think the issue lies with the way the HTTPCaller is encoding the content because if I "Upload From File" (saved to laptop first) it works but "Specify Upload Body" (directly from ArcGIS Online) does not. It's worth noting I can save the PDF from ArcGIS Online using the AttributeFileWriter, which lets me set "Target File Encoding" to "Binary (fme-binary)" while the HTTPCaller gives me no such option. Is there any way I can set the encoding to "Binary (fme-binary)" in the HTTPCaller? Or maybe another option? I could, as you mention above, save the file first and then upload it, but it's not an ideal workflow so I'm hoping for a better solution.

Hi @aaron​ , Unfortunately, we don't have a way to set the encoding to Binary (fme-binary) directly in the HTTPCaller. But, I had some success with using my BinaryEncoder/BinaryDecoder trick from my first response to change the attribute containing the PDF binary from fme-binary to something compatible with Binary (application/octet-stream) in the HTTPCaller.

 

For this, I just fed my arcgisonline_attachment{0}.data into a BinaryEncoder with Encoding Type = HEX and Encode to Different Destination Attributes unchecked (to avoid the bug where the new attributes aren't exposed). Then I passed that newly encoded attribute into a BinaryDecoder with Encoding Type = HEX, Decode to Different Attribute unchecked and the Character Encoding for Output Data set to System Default. In the HTTPCaller, I then set the Content Type for that data to Binary (application/octet-stream) and the API I tested against was then able to recognize the PDF file.

Badge +8

Hi @lauraatsafe​ , I have hit a roadblock trying to upload my ArcGIS Online PDF to another online program using the HTTPCaller. I keep getting an "Invalid .pdf file" response. I think the issue lies with the way the HTTPCaller is encoding the content because if I "Upload From File" (saved to laptop first) it works but "Specify Upload Body" (directly from ArcGIS Online) does not. It's worth noting I can save the PDF from ArcGIS Online using the AttributeFileWriter, which lets me set "Target File Encoding" to "Binary (fme-binary)" while the HTTPCaller gives me no such option. Is there any way I can set the encoding to "Binary (fme-binary)" in the HTTPCaller? Or maybe another option? I could, as you mention above, save the file first and then upload it, but it's not an ideal workflow so I'm hoping for a better solution.

Hi @lauraatsafe​ , using the criteria above I was able to upload the attachment but it comes out corrupt when I download it from the destination online program. (The status code returned from the HTTPCaller was 200.) What's odd is I can write the AGO attachment to a file first and then upload the file with the HTTPCaller and it works fine. I did notice that during translation the file that comes out corrupt is about 2.9 MB while it's only 1.9 MB when uploaded from file. That leads me to believe the same data is being written differently by the HTTPCaller when using Specify Upload Body and Upload From File. Just to make sure I understand clearly, you were able to upload your AGO attachment and it was fine (i.e. - not corrupt) on the other end?

Badge +9

Hi @lauraatsafe​ , I have hit a roadblock trying to upload my ArcGIS Online PDF to another online program using the HTTPCaller. I keep getting an "Invalid .pdf file" response. I think the issue lies with the way the HTTPCaller is encoding the content because if I "Upload From File" (saved to laptop first) it works but "Specify Upload Body" (directly from ArcGIS Online) does not. It's worth noting I can save the PDF from ArcGIS Online using the AttributeFileWriter, which lets me set "Target File Encoding" to "Binary (fme-binary)" while the HTTPCaller gives me no such option. Is there any way I can set the encoding to "Binary (fme-binary)" in the HTTPCaller? Or maybe another option? I could, as you mention above, save the file first and then upload it, but it's not an ideal workflow so I'm hoping for a better solution.

Hi @aaron​ , ah, yes, Sorry! I actually see the same problem too. My original testing consisted of just making sure that the POST request was successful when uploading the PDF, then checking that the file was there. When I download and open that PDF it is corrupt and the file size isn't right (same as what you see). I'll do a bit more digging to see if there's a way to get this to work, but I'm starting to suspect that saving the file first, then uploading it might be the best option here.

Badge +8

Hi @lauraatsafe​ , I have hit a roadblock trying to upload my ArcGIS Online PDF to another online program using the HTTPCaller. I keep getting an "Invalid .pdf file" response. I think the issue lies with the way the HTTPCaller is encoding the content because if I "Upload From File" (saved to laptop first) it works but "Specify Upload Body" (directly from ArcGIS Online) does not. It's worth noting I can save the PDF from ArcGIS Online using the AttributeFileWriter, which lets me set "Target File Encoding" to "Binary (fme-binary)" while the HTTPCaller gives me no such option. Is there any way I can set the encoding to "Binary (fme-binary)" in the HTTPCaller? Or maybe another option? I could, as you mention above, save the file first and then upload it, but it's not an ideal workflow so I'm hoping for a better solution.

Thanks for confirming the corrupt file, @lauraatsafe​. I agree saving the file before uploading may be necessary for now. I'm going to pursue that work-around but let me know if you discover anything of note. I appreciate your help!

Badge +2

Hi @lauraatsafe​ , I have hit a roadblock trying to upload my ArcGIS Online PDF to another online program using the HTTPCaller. I keep getting an "Invalid .pdf file" response. I think the issue lies with the way the HTTPCaller is encoding the content because if I "Upload From File" (saved to laptop first) it works but "Specify Upload Body" (directly from ArcGIS Online) does not. It's worth noting I can save the PDF from ArcGIS Online using the AttributeFileWriter, which lets me set "Target File Encoding" to "Binary (fme-binary)" while the HTTPCaller gives me no such option. Is there any way I can set the encoding to "Binary (fme-binary)" in the HTTPCaller? Or maybe another option? I could, as you mention above, save the file first and then upload it, but it's not an ideal workflow so I'm hoping for a better solution.

Old thread I know, but I came across this issue too (writing corrupt pdf's from an attribute retrieved using HTTP caller). I ended up using python to create the file directly from the attribute:

        # Your Base64 encoded PDF string
        data = feature.getAttribute('Stream')
        outputFilePath = feature.getAttribute('OutputFilePath')
 
        # Decode the Base64 string
        decoded_data = base64.b64decode(data)
 
        # Create directory if it does not exist
        directory = os.path.dirname(outputFilePath)
        if not os.path.exists(directory):
            os.makedirs(directory)
 
        # Write the binary data to a PDF file
        with open(outputFilePath, 'wb') as f:
            f.write(decoded_data)
        self.pyoutput(feature)

 

Reply