Question

Superwide CSV file

12 years ago
March 11, 2013
15 replies
36 views

shepthedog
6 replies

I've got a CSV file that came from a flood modellnig package. I want to geocode and map the water levels at each point (calculation cell centroid) at a series of time slots.

My problem is this: the file has 1 row for each time slot and 1 column for each calculation cell centroid (about 500,000 columns in all).

When I created a CSV reader for this file FME 2013 just sat for hour after hour and never got as far as displaying the reader on my workbench. Presumably it couldn't cope with trying to generate the schema/list of attributes.

Any ideas for processing this file? The only thing I can think of is writing a utility to transpose the file first.

Thanks,

Ian

david_r
8349 replies
12 years ago
March 11, 2013

Hi Ian,

that is an impressive number of columns! Somebody should tip the CSV producer of data normalization ;-)

My first suggestion would be to check if the Python CSV module is able to handle it (documentation here). If so, you could read the CSV within a PythonCreator.

Although, as much as possible, you may want to do some intial processing on the lines of the CSV before you create the features, I'm not sure how well FME handles features with half-a-million attributes...

Good luck!

David

+18

fmelizard
Safer
3725 replies
12 years ago
March 11, 2013

Hi Ian,

Its a long shot but maybe the Column Aligned Text (CAT) reader can be of use?

+24

sigtill
Supporter
956 replies
12 years ago
March 11, 2013

Hi Ian!

1. Have you tried to use the SchemaReader as input and see if that succeeds?

2. Have you tried to copy 2 lines (one with header and one with data) to another separate file and tried do use the CSV-reader on this smaller subset instead?

Does the file open in for instance Notepad++? (http://notepad-plus-plus.org/)

The real SigTill

peter_s
Contributor
19 replies
12 years ago
March 11, 2013

What's the content like, you might be able to treat the file as a raster!

shepthedog
Author
6 replies
12 years ago
March 11, 2013

Sigtill,

The file will open in ordinary Notepad, the first few columns of the first few rows look like this:

Time,Seconds,"528004.22,182214.89","528024.89,182206.78","528047.72,182200.51",

01/01/2000 00:00:00, 0, 0.00000, 0.00000, 0.00000,

01/01/2000 00:00:20, 20, 0.00004, 0.00004, 0.00004,

The part in quotes on the first row are geographic coordinates: X,Y pairs in British National Grid format.

Peter_s,

I'd be intrigued to hear how my file could be treated as a raster!

Regards,

Ian

david_r
8349 replies
12 years ago
March 11, 2013

You might benefit from a pivot of your CSV data. The problem is that your data is much too big for Excel to do it for you (max no. of columns is about 16k).

If you're somewhat procifient in Python, there are some elegant solutions to be found here: http://stackoverflow.com/questions/4869189/how-to-pivot-data-in-a-csv-file

David

+24

sigtill
Supporter
956 replies
12 years ago
March 11, 2013

Ian, can you upload that small example file with 500 000 columns and 2 rows?, easier to test on real data :)

The real SigTill

+24

sigtill
Supporter
956 replies
12 years ago
March 11, 2013

Also note that the new FME2013 SP1 BETA has a line regarding this:

 CSV Reader: Improved performance for large numbers of attributes (PR#43479)

http://downloads.safe.com/fme/beta/whatsnew_sp.txt

Perhaps just try the new beta?

http://www.safe.com/support/support-resources/fme-downloads/beta/

The real SigTill

shepthedog
Author
6 replies
12 years ago
March 12, 2013

Thanks for all your suggestions.

It's not easy for me to get hold of betas - company locks everything down.

What I've done so far is downloaded / adapted a python script that pivots the rows and columns in my CSV and then pointed my workspace at that.

Next job is to integrate the python into my workspace...

If anyone has a link or reference on how I could treat my CSV as a raster, I'd still be very interested in that.

Thanks again,

Ian

+24

sigtill
Supporter
956 replies
12 years ago
March 12, 2013

Hm, not nice to work with that kind of IT-department :( As a side note you can run both the BETA of FME and the Release-version of FME and 32 and 64 bit on the same computer with no issues.

Regarding the python script and Raster I can not help. However if you can upload that small example file with 500 000 columns and 2 rows I can test it to see if its working on the newest beta.

Good luck!

The real SigTill

shepthedog
Author
6 replies
12 years ago
March 12, 2013

SigTill,

Here is the shortened file, with two header rows and one row of data (all zeros!)

https://www.dropbox.com/s/dxltnjwkq3te7xk/Firstline.csv

Regards,

Ian

shepthedog
Author
6 replies
12 years ago
March 12, 2013

I mean one header row - I was thinking of the pivoted version.

david_r
8349 replies
12 years ago
March 12, 2013

FYI, it is possible to use the Python CSV module to read your huge file, the trick is to override the default field size limit. This works with your posted example file and prints out the first column (incl. the header) of each line:

----------

import csv filename = r"d:\\temp\\firstline.csv" csv.field_size_limit(1000000000) # Override default limit with open(filename, 'rb') as csvfile: linereader = csv.reader(csvfile, delimiter=',', quotechar='"') for n, row in enumerate(linereader): if n == 0: # header point3X, point3Y = row[3].split(',') print "Point X=%.2f Y=%.2f" % (float(point3X), float(point3Y)) else: # data print "Value is: %.2f" % float(row[3].strip()) ----------

Feel free to test this with your complete input file and tell us how it works. For the small test file provided the above gives me the following output:

Point X=528024.89 Y=182206.78 Value is: 0.00

From here on it should be easy going to incorporate this into a PythonCreator in your workspace. There is a nice example to get you started in the PythonCreator documentation.

David

david_r
8349 replies
12 years ago
March 12, 2013

Sorry, typo: the script below prints out the FOURTH column, not the first...

peter_s
Contributor
19 replies
12 years ago
March 15, 2013

Having seen the data, I realize that raster is probably not the way to go, I would try restructuring it as a separate pointcloud for each timestamp instead and read those files using the PointCloud XYZ Reader.

Heres a python script (with minimal error handling...) to wrap it out to .xyz:

https://dl.dropbox.com/u/2195684/FMETraining/Firstline.csv.py

good luck!

/Peter

Reply

Rich Text Editor, editor1

Cookie policy

We use cookies to enhance and personalize your experience. If you accept you agree to our full cookie policy. Learn more about our cookies.

Cookie settings

We use 3 different kinds of cookies. You can choose which cookies you want to accept. We need basic cookies to make this site work, therefore these are the minimum you can select. Learn more about our cookies.

Basic
Functional

Normal
Functional + analytics

Complete
Functional + analytics + social media + embedded videos + marketing

Superwide CSV file