Question

Joining multiple CSV files

5 years ago
February 4, 2020
5 replies
364 views

dmeaney
3 replies

CSVJoiner2.fmw

I have many CSV files in a folder that I want to join together based on attribute in first column of every file. Each file has about 300 records and 50 attributes. I want my output excel sheet to have 300 rows with all attributes from all the input files, joined by the code in "Col0"

I've tried DatabaseJoiner and my output has all the input column names, but only the values from the first input table. See attached two CSV files and workspace.

+18

erik_jan
Contributor
2181 replies
5 years ago
February 4, 2020

If all CSV files have the same attribute schema, you could use 2 text readers.

The first reader reads all lines and write them to a new text file.

The second reader reads all other text files, line by line, and skips the first line (header), writing the lines to the same text output file.

At the end you will have one big text (csv) file.

Hope this helps.

markatsafe
1891 replies
5 years ago
February 4, 2020

@dmeaney I split the CSV file you attached into smaller parts so I could simulate multiple source CSV files you're trying to process.

It would be simple if you had unique column names in all your CSV data. If you did, then FeatureMerger would solve it for you.

@erik_jan is on the right track with using text_line reader. But in addition, I think you need to join the records and concatenate them so I've chosen Aggregator to do that. The workspace looks like this:

I've attached the example workspace (2019.2) along with the sample data I created: csvjoiner.fmwt

markatsafe
1891 replies
5 years ago
February 4, 2020

markatsafe wrote:

@dmeaney I split the CSV file you attached into smaller parts so I could simulate multiple source CSV files you're trying to process.

It would be simple if you had unique column names in all your CSV data. If you did, then FeatureMerger would solve it for you.

I've attached the example workspace (2019.2) along with the sample data I created: csvjoiner.fmwt

@dmeaney I notice that the last two lines in your file are what appears to be a header record and then a report line. If you could get the header record to the header of your file then the FeatureMerger would be all you'd need.

"GEO_ID","NAME","B01001_001E","B01001_001M","B01001_002E","B01001_002M","B01001_003E", ...

"id","Geographic Area Name","Estimate!!Total","Margin of Error!!Total", ....

dmeaney
Author
3 replies
5 years ago
February 5, 2020

markatsafe wrote:

@dmeaney I split the CSV file you attached into smaller parts so I could simulate multiple source CSV files you're trying to process.

It would be simple if you had unique column names in all your CSV data. If you did, then FeatureMerger would solve it for you.

I've attached the example workspace (2019.2) along with the sample data I created: csvjoiner.fmwt

Thanks. I'll give it a shot.

FYI - I had sorted my CSVs as part of another approach. We weren't certain that they were all in the same order by ID so we sorted. Unfortunately, the 2 header rows ended up at the bottom. Looks like I'll want to put them back manually (20 times), then try FeatureMerger.

dmeaney
Author
3 replies
5 years ago
February 5, 2020

markatsafe wrote:

"GEO_ID","NAME","B01001_001E","B01001_001M","B01001_002E","B01001_002M","B01001_003E", ...

"id","Geographic Area Name","Estimate!!Total","Margin of Error!!Total", ....

geography.csvACSDT5Y2013.B01003_data_with_overlays_2020-01-06T114007.csvACSDT5Y2013.B01001_data_with_overlays_2020-01-06T113401.csvACSDT5Y2013.B03002_data_with_overlays_2020-01-06T115027.csvACSDT5Y2013.B08141_data_with_overlays_2020-01-06T125604.csv

ACSDT5Y2013.B03002_data_with_overlays_2020-01-06T115027.csv

csvjoiner2.fmw

I moved my column headers to the top of each CSV file and then tried the FeatureMerger tool. I used a simplified geography.csv as the Supplier (just a few fields, including the GEO_ID used for joining), and then another reader for all the other CSVs as the Requestor. I've attached three of those here

In the output, I am getting all the field names and values correctly outputting, but it is not merging on the GEO_ID field, so I'm getting 8,484 lines instead of 303.

Reply

Rich Text Editor, editor1

Cookie policy

We use cookies to enhance and personalize your experience. If you accept you agree to our full cookie policy. Learn more about our cookies.

Cookie settings

We use 3 different kinds of cookies. You can choose which cookies you want to accept. We need basic cookies to make this site work, therefore these are the minimum you can select. Learn more about our cookies.

Basic
Functional

Normal
Functional + analytics

Complete
Functional + analytics + social media + embedded videos + marketing

Joining multiple CSV files