Skip to main content
Released

Hadoop

Related products:FME Form
siennaatsafe
nathanatsafe
fmelizard
danilo_fme
+18
  • hollyatsafe
  • siennaatsafe
    siennaatsafe
  • nathanatsafe
    nathanatsafe
  • fmelizard
    fmelizard
  • danilo_fme
    danilo_fme
  • davideagle
    davideagle
  • stalknecht
    stalknecht
  • courtney_m
    courtney_m
  • boriskirov
  • ml56067
    ml56067
  • rfslivajr
  • geolassi
    geolassi
  • mygis
    mygis
  • samisnunu
    samisnunu
  • roland.martin
    roland.martin
  • adriano
    adriano
  • dkuzhanov
  • colest7678
  • matsb
  • kddunn
  • xuhengx
  • battlezone77
  • pdube

fmelizard
Safer
Support reading/writing to HDFS and Hive and others -- indicate in the comments your ideas.
This post is closed to further activity.
It may be a question with a best answer, an implemented idea, or just a post needing no comment.
If you have a follow-up or related question, please post a new question or idea.
If there is a genuine update to be made, please contact us and request that the post is reopened.

7 replies

  • May 27, 2015
Reading and writing Avro and Parquet file formats natively would be very useful for any integration work. The same is also true of direct HDFS access so those files could be stored directly to the cluster. /Mats :smiley_cat:

davideagle
Contributor
Forum|alt.badge.img+21
  • Contributor
  • January 7, 2016

Just yesterday had a request from a customer to be able to 'write' HDFS. Will post more details when we discover more.


I confirm, very usefull to work on Hadoop HDFS or Spark. It is time (or late?) to enter in Big Data world. Amazone cloud is not sufficient. We are benchmarking ETL and this is one criteria.


fmelizard
Safer
Forum|alt.badge.img+18
  • Author
  • Safer
  • November 29, 2016
Safe PR#60154

 

 


fmelizard
Safer
Forum|alt.badge.img+18
  • Author
  • Safer
  • August 22, 2017

This idea is a bit broad right now and I'd suggest splitting out related Hadoop requests into their own ideas. But the HDFS read/write is now in FME 2018 betas via the HDFSConnector transformer. Give it a spin via http://www.safe.com/beta and let us know what you think.


  • August 28, 2017

This would be very challenging, but your workflows are extremely similar to what one "would like to build" in Spark / Hadoop. I think it would be amazing to be able to run huge transformations (millions / billions/trillions of records) in Spark / Hadoop natively using the FME GUI to design the workflow and FME server to kick off / manage the Spark / Hadoop jobs.

I.e. Each reader/writer could read/write from Hadoop exactly as it does now from hdfs (for common spacial, xls, etc types), in addition to supporting the more Hadoop specific type files (Map, Sequence, Avro, etc). And then each transformer could be a step in the Spark / Hadoop workflow. (There is a performance hit, but one can run Python directly in Spark / Hadoop. And it seems like Python is what backs quite a bit of FME. Java/Scala would be preferable, but Python would get the job done in most cases... And then one could optimize parts natively, like joins.)

I know at least one large company that would buy FME if it supported Hadoop in this way... (I realize that this goes way beyond the mapping space, but I've seen a company spend millions of dollars trying to create what FME does but running on top of Hadoop. I've used Ab Initio, DataStage, and PentaHO and none compare to the user friendliness of FME. They are all too complex, they should focus on input / simple translations / output like FME and they would be radically better. And if you want something more complex string multiple "workspaces" together.)

Likely tl;dr, but just some observations being on multiple sides of this business.


    Needed to compose you a very little word to thank you yet again regarding the nice suggestions you’ve contributed here.

    hadoop-training-institute-in-chennai


    Cookie policy

    We use cookies to enhance and personalize your experience. If you accept you agree to our full cookie policy. Learn more about our cookies.

     
    Cookie settings