Released

Hadoop

Related products:FME Form

Forum|Forum|10 years ago
May 27, 2015
7 replies
65 views

+21

fmelizard
Safer

Support reading/writing to HDFS and Hive and others -- indicate in the comments your ideas.

This post is closed to further activity.
It may be an old question, an answered question, an implemented idea, or a notification-only post.
Please check post dates before relying on any information in a question or answer.
For follow-up or related questions, please post a new question or idea.
If there is a genuine update to be made, please contact us and request that the post is reopened.

matsb
Forum|Forum|10 years ago
May 27, 2015

Reading and writing Avro and Parquet file formats natively would be very useful for any integration work. The same is also true of direct HDFS access so those files could be stored directly to the cluster. /Mats :smiley_cat:

Upvote

+22

davideagle
Contributor
Forum|Forum|9 years ago
January 7, 2016

Just yesterday had a request from a customer to be able to 'write' HDFS. Will post more details when we discover more.

Upvote

kariokarto
Forum|Forum|9 years ago
April 28, 2016

I confirm, very usefull to work on Hadoop HDFS or Spark. It is time (or late?) to enter in Big Data world. Amazone cloud is not sufficient. We are benchmarking ETL and this is one criteria.

Upvote

+21

fmelizard
Author
Safer
Forum|Forum|9 years ago
November 29, 2016

Safe PR#60154

Upvote

+21

fmelizard
Author
Safer
Forum|Forum|8 years ago
August 22, 2017

This idea is a bit broad right now and I'd suggest splitting out related Hadoop requests into their own ideas. But the HDFS read/write is now in FME 2018 betas via the HDFSConnector transformer. Give it a spin via http://www.safe.com/beta and let us know what you think.

Upvote

rdellar17
Forum|Forum|8 years ago
August 28, 2017

This would be very challenging, but your workflows are extremely similar to what one "would like to build" in Spark / Hadoop. I think it would be amazing to be able to run huge transformations (millions / billions/trillions of records) in Spark / Hadoop natively using the FME GUI to design the workflow and FME server to kick off / manage the Spark / Hadoop jobs.

I.e. Each reader/writer could read/write from Hadoop exactly as it does now from hdfs (for common spacial, xls, etc types), in addition to supporting the more Hadoop specific type files (Map, Sequence, Avro, etc). And then each transformer could be a step in the Spark / Hadoop workflow. (There is a performance hit, but one can run Python directly in Spark / Hadoop. And it seems like Python is what backs quite a bit of FME. Java/Scala would be preferable, but Python would get the job done in most cases... And then one could optimize parts natively, like joins.)

I know at least one large company that would buy FME if it supported Hadoop in this way... (I realize that this goes way beyond the mapping space, but I've seen a company spend millions of dollars trying to create what FME does but running on top of Hadoop. I've used Ab Initio, DataStage, and PentaHO and none compare to the user friendliness of FME. They are all too complex, they should focus on input / simple translations / output like FME and they would be radically better. And if you want something more complex string multiple "workspaces" together.)

Likely tl;dr, but just some observations being on multiple sides of this business.

Upvote

careenjoseph
Forum|Forum|7 years ago
April 5, 2018

Needed to compose you a very little word to thank you yet again regarding the nice suggestions you’ve contributed here.

hadoop-training-institute-in-chennai

Upvote

Hadoop

7 replies

Community Stats

Latest FME

Community Stats

Latest FME

Sign up

An FME Account is required to contribute

Login to the community

An FME Account is required to contribute

Scanning file for viruses.

This file cannot be downloaded