Hi Ian,
that is an impressive number of columns! Somebody should tip the CSV producer of
data normalization ;-)
My first suggestion would be to check if the Python CSV module is able to handle it (
documentation here). If so, you could read the CSV within a PythonCreator.
Although, as much as possible, you may want to do some intial processing on the lines of the CSV before you create the features, I'm not sure how well FME handles features with half-a-million attributes...
Good luck!
David
Hi Ian,
Its a long shot but maybe the Column Aligned Text (CAT) reader can be of use?
Hi Ian!
1. Have you tried to use the SchemaReader as input and see if that succeeds?
2. Have you tried to copy 2 lines (one with header and one with data) to another separate file and tried do use the CSV-reader on this smaller subset instead?
Does the file open in for instance Notepad++? (
http://notepad-plus-plus.org/)
What's the content like, you might be able to treat the file as a raster!
Sigtill,
The file will open in ordinary Notepad, the first few columns of the first few rows look like this:
Time,Seconds,"528004.22,182214.89","528024.89,182206.78","528047.72,182200.51",
01/01/2000 00:00:00, 0, 0.00000, 0.00000, 0.00000,
01/01/2000 00:00:20, 20, 0.00004, 0.00004, 0.00004,
The part in quotes on the first row are geographic coordinates: X,Y pairs in British National Grid format.
Peter_s,
I'd be intrigued to hear how my file could be treated as a raster!
Regards,
Ian
You might benefit from a pivot of your CSV data. The problem is that your data is much too big for Excel to do it for you (max no. of columns is about 16k).
If you're somewhat procifient in Python, there are some elegant solutions to be found here:
http://stackoverflow.com/questions/4869189/how-to-pivot-data-in-a-csv-file
David
Ian, can you upload that small example file with 500 000 columns and 2 rows?, easier to test on real data
Also note that the new FME2013 SP1 BETA has a line regarding this:
CSV Reader: Improved performance for large numbers of attributes (PR#43479)
http://downloads.safe.com/fme/beta/whatsnew_sp.txt
Perhaps just try the new beta?
http://www.safe.com/support/support-resources/fme-downloads/beta/
Thanks for all your suggestions.
It's not easy for me to get hold of betas - company locks everything down.
What I've done so far is downloaded / adapted a python script that pivots the rows and columns in my CSV and then pointed my workspace at that.
Next job is to integrate the python into my workspace...
If anyone has a link or reference on how I could treat my CSV as a raster, I'd still be very interested in that.
Thanks again,
Ian
Hm, not nice to work with that kind of IT-department
As a side note you can run both the BETA of FME and the Release-version of FME and 32 and 64 bit on the same computer with no issues.
Regarding the python script and Raster I can not help. However if you can upload that small example file with 500 000 columns and 2 rows I can test it to see if its working on the newest beta.
Good luck!
SigTill,
Here is the shortened file, with two header rows and one row of data (all zeros!)
https://www.dropbox.com/s/dxltnjwkq3te7xk/Firstline.csv
Regards,
Ian
I mean one header row - I was thinking of the pivoted version.
FYI, it is possible to use the Python CSV module to read your huge file, the trick is to override the default field size limit. This works with your posted example file and prints out the first column (incl. the header) of each line:
----------
import csv filename = r"d:\\temp\\firstline.csv" csv.field_size_limit(1000000000) # Override default limit with open(filename, 'rb') as csvfile: linereader = csv.reader(csvfile, delimiter=',', quotechar='"') for n, row in enumerate(linereader): if n == 0: # header point3X, point3Y = rown3].split(',') print "Point X=%.2f Y=%.2f" % (float(point3X), float(point3Y)) else: # data print "Value is: %.2f" % float(rowo3].strip()) ----------
Feel free to test this with your complete input file and tell us how it works. For the small test file provided the above gives me the following output:
Point X=528024.89 Y=182206.78 Value is: 0.00
From here on it should be easy going to incorporate this into a PythonCreator in your workspace. There is a nice example to get you started in the PythonCreator documentation.
David
Sorry, typo: the script below prints out the FOURTH column, not the first...
Having seen the data, I realize that raster is probably not the way to go, I would try restructuring it as a separate pointcloud for each timestamp instead and read those files using the PointCloud XYZ Reader.
Heres a python script (with minimal error handling...) to wrap it out to .xyz:
https://dl.dropbox.com/u/2195684/FMETraining/Firstline.csv.py
good luck!
/Peter