Question

For every group of points, keep the point with the associated attribute that has smallest value

10 years ago
November 8, 2014
8 replies
103 views

makt
Contributor
42 replies

I have a shapefile with points, each with two attributes: class, and rank (a decimal number). There are many points, 10 classes, and a range of rank numbers.

I would like my output to keep only one point per class: the point with the lowest associated rank.

At first I thought Aggrogator would solve my problem entirely - however, while it allows for group by class, I'm not sure how I would then select the point in each group with least value (it is aimed at taking the average or the sum).

takashi
7726 replies
10 years ago
November 8, 2014

Hi,

A combination of the Sorter and the DuplicateRemover does that.

Sort the points by "rank" ascending, and select first points for each "class" (DuplicateRemover, Group By: class).

I think it's a general way, but the Sorter may be inefficient and the DuplicateRemover may consume too huge memory when there are so many points. How many points are there?

If there are millions points, it may be worth to consider using the InlineQuerier with this SQL statement, for example.

Assume that the input port is named "Point" and the Type of "rank" is defined as "integer" or "float" by the parameter setting.

-----

select * from [Point] as a

inner join (select class, min(rank) as minRank from [Point] group by class) as b

on a.class = b.class and a.rank = b.minRank

-----

Takashi

howard_l
44 replies
10 years ago
November 8, 2014

HiYa,

Have you seen the StatisticsCalculator ???

It could be really useful for your purposes.

It creates some very useful fields eg _sum _avg _max _min etc from a dataset within any group of interest if required eg Class in your particular case.

.........................................

If you have millions of points, it's possible you can occasionally run into memory issues with FME (Takashi hints at this).

In this case, another possible approach is to convert / extract your attributes to a tabular format eg MS Access mdb etc and then create a query (view) in the database program outside of FME to achieve the same result ie group on Class and SUM or AVG on field of interest. You could then bring the resulting query / view back into FME if required.

.............................................

Just a few ideas

Hope this helps

Howard L'

takashi
7726 replies
10 years ago
November 8, 2014

! inspired by the StatisticsCalculator - Python.

If you can discard all the points except minimum rank points, a PythonCaller with this script could be much more efficient. It consumes memory for only points of number of classes.

-----

# Python Script Example

class MinRankCollector(object):

def __init__(self):

# This dictionary collects minimum rank points for each class.

# Key: class, Value: (rank, feature)

self.points = {}

def input(self,feature):

cls = feature.getAttribute('class')

rnk = int(feature.getAttribute('rank'))

if cls not in self.points or rnk < self.points[cls][0]:

self.points[cls] = (rnk, feature)

def close(self):

# Output minimum rank point features.

for rnk, feature in self.points.values():

self.pyoutput(feature)

-----

+15

gio
Contributor
2252 replies
10 years ago
November 10, 2014

statisticsCalculator.

Analyse "rank" for minimum (_min) grouped by "class".

Keep (at least) id and use summary (output)

....

makt
Author
Contributor
42 replies
10 years ago
November 12, 2014

Hi All,

Thanks for the tips. I ended up using Takashi's Sorter+DuplicateRemover method (I dont have a lot of data in this case, so memory was not an issue).

I also tried the Statistics Calculator method, but could not reassemble the summary output of the tool as points rather than a a table. If I used CoordinateExtractor before StatisticsCalculator, I could not reattach the corresponding X,Y data (since this data was not "summarized"). On the other hand, taking the Complete output from Statistics Calculator would work, but only if I also put it through a Sorter and Duplicate Remover.

Probably I'm missing something with StatisticsCalulator functionality as I have never used it. I would be interested to know what is the best way of reassembling data spatially from the summary output of StatisticsCalculator - might be useful in the future.

takashi
7726 replies
10 years ago
November 12, 2014

A possible way is: add a FeatureMerger; send all the original points to the Requestor port; send the Summary feature to the Supplier port; merge the Summary (Supplier) to Requestor features joining on "rank" (Requestor) to "rank._min" (Supplier), grouping by "class".

You can get the minimum rank points from the Merged port.

This method is conceptually similar to the InlineQuerier approach I posted before.

There is more than one way to skin a cat ;)

makt
Author
Contributor
42 replies
10 years ago
November 12, 2014

Great to know - now the Statistics Calculator method works like a charm too!

takashi
7726 replies
10 years ago
November 12, 2014

The StatisticsCalculater+FeatureMerger method is useful when you need to respect the order of input features and/or use other features (NotMerged) in the following processes.

Reply

Rich Text Editor, editor1

Cookie policy

We use cookies to enhance and personalize your experience. If you accept you agree to our full cookie policy. Learn more about our cookies.

Cookie settings

We use 3 different kinds of cookies. You can choose which cookies you want to accept. We need basic cookies to make this site work, therefore these are the minimum you can select. Learn more about our cookies.

Basic
Functional

Normal
Functional + analytics

Complete
Functional + analytics + social media + embedded videos + marketing

For every group of points, keep the point with the associated attribute that has smallest value