Question

Parquet File not Readable over s3 because of slashes

3 months ago
November 27, 2024
3 replies
66 views

+2

mccunee
Contributor
8 replies

I have many parquet files stored in an s3 bucket; when I use the s3 connector to download the parquet file, I am able to read it successfully, but passing the s3 uri (“s3://”) raises an error:

PARQUET reader: PARQUET reader: Failed to open file 's3:\my-bucket-s3-test\fme-test\test.parquet' for reading. Please ensure that the file exists and you have sufficient privileges to read it
Failed to obtain any schemas from reader 'PARQUET' from 1 datasets. This may be due to invalid datasets or format accessibility issues due to licensing, dependencies, or module loading. See logfile for more information

When I try to pass the url:

PARQUET reader: PARQUET reader: Failed to open file 'HTTPS:\s3.us-west-2.amazonaws.com\my-bucket-s3-test\fme-test\test.parquet' for reading. Please ensure that the file exists and you have sufficient privileges to read it

When I use the feature reader, regardless of whether or not I specify a web connection, it converts forward slashes to back slashes. How do I work with this?

Context:

There will be 1000s of files in this bucket. I am using “list” to generate a feature of the pathnames, then using automations to process the files in this bucket. I do not want to download all of these files, given the memory strain, and I need to use attributes to pass the uris to next part of the process

+50

hkingsbury
Celebrity
1410 replies
3 months ago
November 28, 2024

My understanding of how s3 works in FME is that you do need to download each file before being able to use it in a reader.

The approach i’d take in this scenario is:

Get a list of all the files in the bucket
Explode them to individual features
In a custom transformer (set to group by the file name)
- Download the file
- read it
- delete the downloaded file
- perform any required analysis/processing

+2

mccunee
Author
Contributor
8 replies
3 months ago
December 3, 2024

Ok, thank you @hkingsbury much appreciated for the clarification. I did figure out something, but it still downloads the Parquet file to temp (which slightly undermines the purpose of cloud optimized formats). In the Apache Parquet format the drop down arrow has a browse web - select from s3 option. This reformats the URL in a way that safe can read, which isn’t your typical s3 url. I can then modify with a text editor to use different inputs. This is kind of clunky however, and the feature reader UI makes it really tough to modify s3 URIs. If any developers read this- improvements to the s3 browsing and parsing would be much appreciated.

+50

hkingsbury
Celebrity
1410 replies
3 months ago
December 4, 2024

mccunee wrote:

Ok, thank you @hkingsbury much appreciated for the clarification. I did figure out something, but it still downloads the Parquet file to temp (which slightly undermines the purpose of cloud optimized formats). In the Apache Parquet format the drop down arrow has a browse web - select from s3 option. This reformats the URL in a way that safe can read, which isn’t your typical s3 url. I can then modify with a text editor to use different inputs. This is kind of clunky however, and the feature reader UI makes it really tough to modify s3 URIs. If any developers read this- improvements to the s3 browsing and parsing would be much appreciated.

It would be worth creating an idea with this - https://community.safe.com/ideas

Parquet File not Readable over s3 because of slashes

3 replies

Reply

Most helpful members this week

Recently Solved Questions

2D Arcreplacer

FME Server/Flow HTTPCaller Fails with Web Connection but works on Desktop/Form

Using HttpCaller with custom SSL client certificate - doesn't work !?

How to: Feature based Filter?

Create new Web Connection in HTTPCaller?

Community Stats

Cookie policy

Cookie settings

Reply

Related topics

How to download and install data collector on a machineicon

Zero to StreamSets

Streamset data collector 3.x tarball download link not availableicon

How do I downloading a local version of StreamSets on my laptop?icon

How do I install StreamSets on a Windows 10 machine?icon

Most helpful members this week

Recently Solved Questions

2D Arcreplacer

FME Server/Flow HTTPCaller Fails with Web Connection but works on Desktop/Form

Using HttpCaller with custom SSL client certificate - doesn't work !?

How to: Feature based Filter?

Create new Web Connection in HTTPCaller?

Community Stats

Sign up

An FME Account is required to contribute

Login to the community

An FME Account is required to contribute

Scanning file for viruses.

This file cannot be downloaded

Cookie policy

Cookie settings