Skip to main content

I am trying to append 5 csv files together. I also have to order their position in the final file based upon a name of an attribute in each file. As an example csv file file01.csv has a field called abcd, file02.csv has a field called abcde and file03.csv has a field called abcdef. I want to merge all the csv files together so that the contents of file01.csv appears in lines 0-10, file02.csv in line 11-20 and file03.csv in 21-30.

The merging of the files can be achieved by using the string concatenator and then writing only the _concatenated field to another csv file. But what I cannot do is order the position in which the files are writen into the final csv file. Due to the contents of the csv files, after the string_concatenator runs, all the fields are written into a single attribute. The name of the attribute is the summation of all the fields in the original file. The files which should appear at the top of the final csv file also have the shortest attribute name. Therefore I am thinking that if I could access the attribute name and use the length of the string as the import-criteria for the import, the shortest would be at the top and the longest at the bottom. Therefore I am asking the question, how can I expose the name of an attribute an use it for further processing.

You can get the attribute name from a schema reader, the schema port of the FeatureReader transformer, or in python using the getAllAttributeNames method.


To expose names of attributes that are attached to the features the AttributeExposer transformer can be used. The CSV reader has an option to skip the first line and use the contents as field names. I assume you used that option on the reader.


This is an interesting one for sure. I think we need to understand a bit more of the problem before we can provide some additional guidance.

In particular, do you know at the time you make the workspace what the field names are for all the CSV files you're working with. Or do you need to make a workspace that will work with any set of CSVs someone may throw at you, whose column names you don't know.

If it is the former situation and you know the field names (schemas) ahead of time, then I can imagine one workflow to explore.

If you don't know the field names, then another (more complex) workflow would be needed.

Lastly, AttributeExploder potentially could be your friend here...but I won't make that promise.

Maybe provide us with a set of input files (faked if necessary) and desired output and we can puzzle from that...


Use schema reader and path reader to create a list of attributes to create a file_attributename table.

U can use a merger on filename to link schema and path.

I found workspaces posted here 2014 about exactly the same question.

I vote for stack-exchange like point rewardsystem ...;)


Basically you need to sort the features, for which a numeric value works best. So assuming you know the attribute names in advance, put an AttributeCreator/Manager down. Create SortKey and use conditional mode. If attribute exists abcd then SortKey=1, else if attribute exists abcde then SortKey=2

Then sort data by SortKey.

Does that help?


@robertdbuckley

Here is a way to do it (drag drop file didnt work for me, so here are pics)

Greets.


Reply