Skip to main content
Hello again!

 

 

I have a set of csv file pairs called e.g.

 

 

1111_AB.csv

 

1111_AB_type47.csv

 

 

1112_AB.csv

 

1112_AB_type47.csv

 

 

I need to manipulate the contents of the *_type47 files and then append them, minus the header line, to the foot of the master file.

 

 

In my workspace I've got a CSV reader --> FeatureTypeExtractor (to write the file name to an attribute) --> StringReplacer (to replace the _type47.csv with an empty string) --> AttributeCreator (to update the value of an attribute for each feature from 47 to 52).

 

 

Once updated I'd like to write the features to the foot of the master CSV files (based on the value in my FeatureTypeExtractor) in EXACTLY the same order and format as the source.

 

 

What's the best strategy for this? I'm a bit stuck...

 

 

Thanks,

 

 

Matt

 

 

Hi,

 

 

You could use the csv_line_number. Expose this attribute on the format attribute tab.

 

 

Add the last csv_line_number to the linenumbers from the manipulated file.

 

This, after u have skipped (or removed) the headerline(s) of the manipulated file.

 

Route both to a sorter, sort and write. (this last step might not be needed, but to make sure..)

 


Hi Matt,

 

 

I would use the fanout option of the CSV writer feature type. You can control reading order for source datasets in Navigator window. That is, upper writer in Navigator will be run first.

 

 

 

Takashi
When doing it that way you may want to skip header, using the csv reader parameter panel.
Thanks Gio and Takashi for your rapid reponses. I've been working on another problem for the last couple of days so have only just checked back.

 

 

I've attempted the approach Takashi has provided in his screenshot. The problem I have is that the 1111_AB.csv and 1111_AB_type47.csv have records of different schemas so I receive datatype mismatch messages such as

 

 

Attribute of type decimal has an illegal value of '2014-04-30'. Value must be < 1000 or > -1000. Value set to missing value

 

String value `2014-04-30' contains invalid characters and could not be converted into a float

 

String value `%0' contains invalid characters and could not be converted into a float

 

String value `%0' contains invalid characters and could not be converted into a float

 

 

when I run the workspace.

 

 

How do I accommodate this?

 


If it's allowed that the fields contain mixed type data, specify "text" to types for the fields in the CSV writer feature type. Otherwise, use the AttributeClassifier to determine data type, and modify Failed values appropriately.
I ended up using two separate workbenches. 1 to perform the manipulation of attributes in the *type47.csv files and then a workspace with text line readers and writers with the Read Whole File at Once parameter set to Yes to put the files together in exactly the way they're formatted in the source files.

Reply