Skip to main content
Solved

How to find a count of values in a field having a separator ?


Forum|alt.badge.img

I have spent hours re-engineering my data but couldn’t find a best resolve to automate data to provide statistics. Thinking, I could have saved myself a lot of trouble with a simple transformer with set logic using FME.

 

 

I have a range of attributes containing values such as "Class1, Class1a, Class3, Class4a". values are picked up from a drop down list by the surveyors, that sits in an attributed separated by comma (,).

 

I need to count the number of items in each attribute, assuming that they were separated by commas. So, the transformer would need to return the value 4 for my example above and for various other attributes using the same logic.

I need to use the logic "If the cell is blank, value= 0, if the cell is not blank, but does not contain a comma, value=1, otherwise, value= (number of commas +1) i.e. read the string, if there are two commas in the string, then there must be three values, and so on.

 

 

Logic is clear of what I'm trying to achieve but, I'm struggling on the bit that counts the number of commas found in the string.

Anyone got an idea how and which FME transformers can be applied ?

 

All help will be gratefully appreciated.

Best answer by david_r

Try a StringSearcher with the following regex which is based on searching for all the substrings that does NOT contain a comma:

([^,]+)

Also define a list name for the matches, e.g.

0684Q00000ArLKmQAN.png

You can then use a ListElementCounter to find the number of comma-separated values in your input string.

If you need this functionality in several places, it's fairly easy to pack it into a custom transformer.

View original
Did this help you find an answer to your question?

8 replies

david_r
Evangelist
  • Best Answer
  • April 9, 2018

Try a StringSearcher with the following regex which is based on searching for all the substrings that does NOT contain a comma:

([^,]+)

Also define a list name for the matches, e.g.

0684Q00000ArLKmQAN.png

You can then use a ListElementCounter to find the number of comma-separated values in your input string.

If you need this functionality in several places, it's fairly easy to pack it into a custom transformer.


jdh
Contributor
Forum|alt.badge.img+28
  • Contributor
  • April 9, 2018

Could you use an attributeSplitter (separator: comma) to create a list and then a listElementCounter to get the number of items in the list?


Forum|alt.badge.img
  • Author
  • April 11, 2018

Thanks David. Magic worked well.

So much valuable. I appreciate


Forum|alt.badge.img
  • Author
  • April 11, 2018

Yes, that's the way to go. Search the strings and store the matched items. Then use the stored items as input to ListElementCounter.

This made my life easier.


takashi
Contributor
Forum|alt.badge.img+19
  • Contributor
  • April 14, 2018

I agree that it's a good approach to split the string into individual items then count the number of them. 

Alternatively, you can also replace each item (including trailing delimiter) with a single character (e.g. 'x') then get the length (i.e. number of characters) of the resulting string ('xxx...'), which you can consider as the number of items. For example, this expression returns the number of comma-separated items in the value of 'Building Classification' attribute.

@StringLength(@ReplaceRegEx(@Value(Building Classification),".+?(,|$)",x))

tim_wood
Contributor
Forum|alt.badge.img+8
  • Contributor
  • April 8, 2020
david_r wrote:

Try a StringSearcher with the following regex which is based on searching for all the substrings that does NOT contain a comma:

([^,]+)

Also define a list name for the matches, e.g.

0684Q00000ArLKmQAN.png

You can then use a ListElementCounter to find the number of comma-separated values in your input string.

If you need this functionality in several places, it's fairly easy to pack it into a custom transformer.

This is brilliant.  Exactly what I was looking for!

I have Postcode data with an attribute that has either YES or NO in it.  I'm aggregating data by Postcode Sector, then I want to report how many YESs and how many NOs there are in that Sector.  I spent hours and got nowhere putting the values into a List.  But concatenating the values in the Aggregator then using this answer did the job in 5 minutes!


ebygomm
Influencer
Forum|alt.badge.img+31
  • Influencer
  • April 8, 2020
tim_wood wrote:

This is brilliant. Exactly what I was looking for!

I have Postcode data with an attribute that has either YES or NO in it. I'm aggregating data by Postcode Sector, then I want to report how many YESs and how many NOs there are in that Sector. I spent hours and got nowhere putting the values into a List. But concatenating the values in the Aggregator then using this answer did the job in 5 minutes!

If you want to count occurrences of values in a list, the ListHistogrammer is what you want


tim_wood
Contributor
Forum|alt.badge.img+8
  • Contributor
  • April 8, 2020
ebygomm wrote:

If you want to count occurrences of values in a list, the ListHistogrammer is what you want

I tried that, having found the suggestion in a different question. But I couldn't get it to work. I just got blank values. I could have persevered but this worked instantly:

ListHistogrammer may do the counting better, but you've still got to know which element of the _histogram List NO and YES are in to be able to extract the count. And if you get other values in the data, these could upset your _histogram List, for example:

_histogram{0}.value = 'FISH'

_histogram{0}.count = 3

_histogram{1}.value = 'NO'

_histogram{1}.count = 2

_histogram{2}.value = 'YES'

_histogram{2}.count = 1

If it was only YES and NO in the data, I could probably sort the List, then assume element 0 was NO and element 1 was YES, but FISH puts a spanner in the works. If I have to test the value of each _histogram element to see if it's NO or YES, I will have to work out how many elements there are and loop through the List, testing each one.

The benefit of david_r's solution is that you only count the things you want to count, and don't have to worry about rogue values. That's also the downside though. The sum of the counts may not equal the total number of aggregated features. For example, if I aggregate 10 features but one of them has FISH instead of YES or NO, I will end up with say 4 YESs and 5 NOs, which is one short of the total. But the data could be validated upstream of this part of the process.


Reply


Cookie policy

We use cookies to enhance and personalize your experience. If you accept you agree to our full cookie policy. Learn more about our cookies.

 
Cookie settings