Question

splitting records with multiple delimiters?

  • 10 January 2017
  • 8 replies
  • 9 views

Badge +10

Hello,

I have a CSV dataset that is mostly records like this, with a "|" delimiter

2185996|SP213604|1|C|P|UK|||||857||||GYMPIE|RD||CHERMSIDE|BRISBANE CITY|1000|-27.38682393|153.03160112|PC|GDA94

However, some of the records also have a "," delimiter

2388577|SP246762|6|C|P|UK||||WHELLER ON THE PARK|950||||GYMPIE|RD||CHERMSIDE|BRISBANE CITY|1000|-27.38253041,-27.380533|153.02547625,153.027633|PC,BC|GDA94

This is a dataset of street addresses and property points. Most points are "parcel centre" or PC. But some also have a "building centre" BC. These records have two lats, two longs and two types, separated by commas.

Is there a transformer in FME that will split each of these records into two records?


8 replies

Userlevel 6
Badge +31

You could use the AttributeSplitter to solve this. "Splits a selected attribute into a list attribute. Each item in the list will contain a single token split from the list." Then you can duplicate the record and overwrite the coordinate columns with input from the list.

01-csv-ffs.fmwt

You could / should expand this with checks.

 

- Do the lon lat lists contain the same number of elements within the record. (ListElementCounter, Tester)

 

- Are the new lon lat values numeric? (Tester, Type is Numeric.)

Workbench and data added as template.

Userlevel 4

If you have a lot of different separators, you can simplify your workspace a bit by using the StringPairReplacer to homogenize all the separators into a single one before the AttributeSplitter.

Badge

If you have a lot of different separators, you can simplify your workspace a bit by using the StringPairReplacer to homogenize all the separators into a single one before the AttributeSplitter.

I was about to give the same advise but right before typing my answer, I saw your comment appear. Damn you're fast ;)

 

 

Userlevel 4
I was about to give the same advise but right before typing my answer, I saw your comment appear. Damn you're fast ;)

 

 

Can't sleep around here! ;-)

 

Badge +16

I would also opt for unifying the delimiters and then using a single AttributeSplitter.

Userlevel 6
Badge +31

If you have a lot of different separators, you can simplify your workspace a bit by using the StringPairReplacer to homogenize all the separators into a single one before the AttributeSplitter.

I'm learning every day so please be gentle. I would like to know how you would solve this? The rows with multiple separators will get more list elements / columns. I can't get my head around it how to do it other than manual assign the correct column names?
Userlevel 4
I'm learning every day so please be gentle. I would like to know how you would solve this? The rows with multiple separators will get more list elements / columns. I can't get my head around it how to do it other than manual assign the correct column names?
I just had another look at your data. If your CSV contains column names on the first row, I think the easiest would be to read it using the CSV reader, then split up any "double" values (e.g. coordinate pairs) using the AttributeSplitter.
Userlevel 2
Badge +17

I think the MultiAttributeSplitter from the FME Hub can be used effectively in conjunction with the CSV reader.

Result from the two "pipe separated values" sample lines. The second sample line has been separated into two records- one for PC, another for BC.

Reply