Skip to main content
Question

Superwide CSV file


I've got a CSV file that came from a flood modellnig package. I want to geocode and map the water levels at each point (calculation cell centroid) at a series of time slots.

 

 

My problem is this: the file has 1 row for each time slot and 1 column for each calculation cell centroid (about 500,000 columns in all).

 

 

When I created a CSV reader for this file FME 2013 just sat for hour after hour and never got as far as displaying the reader on my workbench. Presumably it couldn't cope with trying to generate the schema/list of attributes.

 

 

Any ideas for processing this file? The only thing I can think of is writing a utility to transpose the file first.

 

 

Thanks,

 

Ian

15 replies

Userlevel 5
Hi Ian,

 

 

that is an impressive number of columns! Somebody should tip the CSV producer of data normalization ;-)

 

 

My first suggestion would be to check if the Python CSV module is able to handle it (documentation here). If so, you could read the CSV within a PythonCreator.

 

 

Although, as much as possible, you may want to do some intial processing on the lines of the CSV before you create the features, I'm not sure how well FME handles features with half-a-million attributes...

 

 

Good luck!

 

 

David
Userlevel 5
Badge +13
Hi Ian,

 

 

Its a long shot but maybe the Column Aligned Text (CAT) reader can be of use?
Userlevel 1
Badge +24
Hi Ian!

 

1. Have you tried to use the SchemaReader as input and see if that succeeds?

 

2. Have you tried to copy 2 lines (one with header and one with data) to another separate file and tried do use the CSV-reader on this smaller subset instead?

 

 

Does the file open in for instance Notepad++? (http://notepad-plus-plus.org/)
Badge
What's the content like, you might be able to treat the file as a raster!
Sigtill,

 

 

The file will open in ordinary Notepad, the first few columns of the first few rows look like this:

 

 

Time,Seconds,"528004.22,182214.89","528024.89,182206.78","528047.72,182200.51",

 

01/01/2000 00:00:00,                   0,     0.00000,     0.00000,     0.00000,

 

01/01/2000 00:00:20,                  20,     0.00004,     0.00004,     0.00004,

 

 

The part in quotes on the first row are geographic coordinates:  X,Y pairs in British National Grid format.

 

 

Peter_s,

 

 

I'd be intrigued to hear how my file could be treated as a raster!

 

 

Regards,

 

Ian
Userlevel 5
You might benefit from a pivot of your CSV data. The problem is that your data is much too big for Excel to do it for you (max no. of columns is about 16k).

 

 

If you're somewhat procifient in Python, there are some elegant solutions to be found here: http://stackoverflow.com/questions/4869189/how-to-pivot-data-in-a-csv-file

 

 

David
Userlevel 1
Badge +24
Ian, can you upload that small example file with 500 000 columns and 2 rows?, easier to test on real data :)
Userlevel 1
Badge +24
Also note that the new FME2013 SP1 BETA has a line regarding this:

 

 CSV Reader: Improved performance for large numbers of attributes (PR#43479)
 http://downloads.safe.com/fme/beta/whatsnew_sp.txt

 

 

Perhaps just try the new beta?

 

 

http://www.safe.com/support/support-resources/fme-downloads/beta/
Thanks for all your suggestions.

 

 

It's not easy for me to get hold of betas - company locks everything down.

 

 

What I've done so far is downloaded / adapted a python script that pivots the rows and columns in my CSV and then pointed my workspace at that.

 

 

Next job is to integrate the python into my workspace...

 

 

If anyone has a link or reference on how I could treat my CSV as a raster, I'd still be very interested in that.

 

 

Thanks again,

 

Ian
Userlevel 1
Badge +24
Hm, not nice to work with that kind of IT-department :( As a side note you can run both the BETA of FME and the Release-version of FME and 32 and 64 bit on the same computer with no issues. 

 

 

Regarding the python script and Raster I can not help. However if you can upload that small example file with 500 000 columns and 2 rows I can test it to see if its working on the newest beta.

 

 

Good luck!
SigTill,

 

 

Here is the shortened file, with two header rows and one row of data (all zeros!)

 

 

https://www.dropbox.com/s/dxltnjwkq3te7xk/Firstline.csv

 

 

Regards,

 

Ian
I mean one header row - I was thinking of the pivoted version.
Userlevel 5
FYI, it is possible to use the Python CSV module to read your huge file, the trick is to override the default field size limit. This works with your posted example file and prints out the first column (incl. the header) of each line:

 

 

----------

 

import csv   filename = r"d:\\temp\\firstline.csv" csv.field_size_limit(1000000000) # Override default limit   with open(filename, 'rb') as csvfile:     linereader = csv.reader(csvfile, delimiter=',', quotechar='"')     for n, row in enumerate(linereader):         if n == 0: # header             point3X, point3Y = row[3].split(',')             print "Point X=%.2f Y=%.2f" % (float(point3X), float(point3Y))         else: # data             print "Value is: %.2f" % float(row[3].strip()) ----------

 

  Feel free to test this with your complete input file and tell us how it works. For the small test file provided the above gives me the following output:

 

 

Point X=528024.89 Y=182206.78 Value is: 0.00

 

From here on it should be easy going to incorporate this into a PythonCreator in your workspace. There is a nice example to get you started in the PythonCreator documentation.

 

 

David

 

 

Userlevel 5
Sorry, typo: the script below prints out the FOURTH column, not the first...
Badge
Having seen the data, I realize that raster is probably not the way to go, I would try restructuring it as a separate pointcloud for each timestamp instead and read those files using the PointCloud XYZ Reader.

 

Heres a python script (with minimal error handling...) to wrap it out to .xyz:

 

 

https://dl.dropbox.com/u/2195684/FMETraining/Firstline.csv.py

 

 

good luck!

 

 

/Peter

Reply