Question

Attribute Manipulation : Splitting a list at discontinuities

  • 27 February 2017
  • 8 replies
  • 1 view

Badge

How do I retrieve the lowers and uppers bounds in a discontinuous list of values (alphanumeric)?

Example of list: {5, 6, 7, 8, 10a, 11, 15, 16x, 17, 18, 19, 20, 25}

The extraction I'm looking for would be: 5-8, 10a-11, 15-20, 25

Thank you


8 replies

Userlevel 4

I'm sure this can be done using transformers, but I suspect it'll be rather convoluted. Here's a possible solution using a PythonCaller:

import fmeobjects
from itertools import groupby
from operator import itemgetter

def str2int(s):
    return int(''.join([x for x in s if x.isdigit()]))

def split_list_at_bounds(feature):
    data = feature.getAttribute('values')
    if data:
        data = [x.strip() for x in data.replace('{', '').replace('}', '').split(',')]
        bounds = []
        for k, g in groupby(enumerate(data), lambda (i,x):i-str2int(x)):
            bounds.append(map(itemgetter(1), g))
        for n, boundary in enumerate(bounds):
            if len(boundary) == 1:
                s = boundary[0]
            else:
                s = '-'.join([boundary[0], boundary[-1]])
            feature.setAttribute('boundary{%s}' % n, s)

Assuming the input attribute values = "{5, 6, 7, 8, 10a, 11, 15, 16x, 17, 18, 19, 20, 25}" it will output list boundary{} like the following:

0684Q00000ArLSTQA3.png

The grouping mechanism in str2int() will strip away all non-digits, so be careful if you have values like 12x3 as it will be interpreted as 123 and not 12.

Userlevel 1
Badge +21

Yep, convoluted!

attribute-manipulation.fmw

Userlevel 2
Badge +17

A geometric approach could also be an alternative.

Badge

I am admiring the flexibility of this tool and the genius of its users.

Thank you

Badge

I need a little time to test all these solutions. Thank you

Userlevel 4

I need a little time to test all these solutions. Thank you

Performance-wise, I would expect the Python-solution to be the fastest by quite a margin, followed by the solution from @egomm. The suggestion from @takashi is really cool and a great demonstration of the flexibility of FME, but I suspect it is relatively slow if you have a lot of data.

 

 

Userlevel 4

A geometric approach could also be an alternative.

Very cool.
Userlevel 2
Badge +17

I need a little time to test all these solutions. Thank you

The geometric approach is interesting and demonstrates the flexibility of FME, but its performance is not good as @david_r pointed it out. If the performance is critical, I would not recommend you to use the geometric method in the practical workspace, and consider adopting the Python solution.

Reply