Solved

FME How to read multiple CSV files from many folders and write to GDP tables by folder

  • 15 November 2016
  • 2 replies
  • 90 views

I have 13000+ CSV files, all containing around 13000 rows. These CSV files are arranged in 146 different folders. All CSV files have the same schema with ; as separator. These CSV files contain about 175 million rows of data combined. The folders are named as seen in the picture below (for example ...\\CSV_folders\\5785xxx)

CSV files are named like these:

I already tried to copy all CSV files to same directory and run them to one GDB with FME. This went well... but resulted too heavy ESRI GDB that takes forever to search through. (175 million rows, around 20GB)

Now I would need to make things a bit differently. I would like to work with FME to create one ESRI Geodatabase. It should contain tables named like these folders containing CSV files. A table in the GDB should have all the CSV files loaded in the table, that are inside the original folder. This would split my 175 million rows to 146 tables, making it a bit faster to search trough with Arcmap.

So my problem is, how to make FME read CSV files from all of the folders and write a GDB with tables named like the folder, each table containing the same information the CSV files have in the original folder combined. I'm not familiar with PostGIS etc. (and it would not work with ArcGIS), so this needs to be done with some kind of ESRI GDB workaround..

icon

Best answer by gifupack 16 November 2016, 07:28

View original

2 replies

Badge +16

Hi @gifupack, I would look into using the fme format attributes (fme_dataset, fme_feature_type) for defining the gdb output tables and for using in a fanout on the gdb writer (to produce multiple tables based on an attribute name or combination)

https://knowledge.safe.com/articles/565/fanout-1.html

https://knowledge.safe.com/articles/192/whats-the-difference-between-user-attributes-and-f.html

A tip on reading the CSV data, have a look at the Point Cloud XYZ Reader.

Hope this helps.

I got an answer to my question from nicname Fezter at GIS stackexchange website. I copy-pasted it to here, because I think the answer was great, with clear explanation.  

 

This is a pretty straightforward exercise in FME. You'll need to do the following:

  1. Load your CSVs using a dynamic reader (Single Merged Feature Type) and point to the whole folder where your CSVs are stored. Use the advanced browser to select the whole folder: 0684Q00000ArJxyQAF.png

    Ensure you select search subfolders: 0684Q00000ArJy3QAF.png

  2. In the reader, expose the fme_dataset attribute by right clicking on the reader and clicking on Properties. Then go to the Format Attributes tab: 0684Q00000ArJkwQAF.png
  3. In your workbench, add a FilenamePartExtractor transformer and point it to the fme_dataset attribute. The field you need is 
    _dirname
    .

    0684Q00000ArJwIQAV.png

    1. Finally, in your writer, you'll want to set your Table name to an expression which includes your 
      _dirname
       from the FilenamePartExtractor. The reason I did this was because your folder names start with a number and feature classes in a file geodatabase cannot begin with a number. Also note that if you're using the ArcObjects File Geodatabase writer, you will set your geometry to 
      geodb_table
      0684Q00000ArJevQAF.pngIf you're using the API writer then you will set your geometry to 
      geodb_no_geom
      0684Q00000ArJqvQAF.png

You can see that I had folders within the CSV folder called 1, 2 and 3. They wrote to the GeoDatabase as tables called Table_1, Table_2, and Table_3:

0684Q00000ArJpiQAF.png

Reply