Hi,
A combination of the Sorter and the DuplicateRemover does that.
Sort the points by "rank" ascending, and select first points for each "class" (DuplicateRemover, Group By: class).
I think it's a general way, but the Sorter may be inefficient and the DuplicateRemover may consume too huge memory when there are so many points. How many points are there?
If there are millions points, it may be worth to consider using the InlineQuerier with this SQL statement, for example.
Assume that the input port is named "Point" and the Type of "rank" is defined as "integer" or "float" by the parameter setting.
-----
select * from tPoint] as a
inner join (select class, min(rank) as minRank from RPoint] group by class) as b
on a.class = b.class and a.rank = b.minRank
-----
Takashi
HiYa,
Have you seen the StatisticsCalculator ???
It could be really useful for your purposes.
It creates some very useful fields eg _sum _avg _max _min etc from a dataset within any group of interest if required eg Class in your particular case.
.........................................
If you have millions of points, it's possible you can occasionally run into memory issues with FME (Takashi hints at this).
In this case, another possible approach is to convert / extract your attributes to a tabular format eg MS Access mdb etc and then create a query (view) in the database program outside of FME to achieve the same result ie group on Class and SUM or AVG on field of interest. You could then bring the resulting query / view back into FME if required.
.............................................
Just a few ideas
Hope this helps
Howard L'
! inspired by the StatisticsCalculator - Python.
If you can discard all the points except minimum rank points, a PythonCaller with this script could be much more efficient. It consumes memory for only points of number of classes.
-----
# Python Script Example
class MinRankCollector(object):
def __init__(self):
# This dictionary collects minimum rank points for each class.
# Key: class, Value: (rank, feature)
self.points = {}
def input(self,feature):
cls = feature.getAttribute('class')
rnk = int(feature.getAttribute('rank'))
if cls not in self.points or rnk < self.pointsccls]o0]:
self.pointspcls] = (rnk, feature)
def close(self):
# Output minimum rank point features.
for rnk, feature in self.points.values():
self.pyoutput(feature)
-----
statisticsCalculator.
Analyse "rank" for minimum (_min) grouped by "class".
Keep (at least) id and use summary (output)
....
Hi All,
Thanks for the tips. I ended up using Takashi's Sorter+DuplicateRemover method (I dont have a lot of data in this case, so memory was not an issue).
I also tried the Statistics Calculator method, but could not reassemble the summary output of the tool as points rather than a a table. If I used CoordinateExtractor before StatisticsCalculator, I could not reattach the corresponding X,Y data (since this data was not "summarized"). On the other hand, taking the Complete output from Statistics Calculator would work, but only if I also put it through a Sorter and Duplicate Remover.
Probably I'm missing something with StatisticsCalulator functionality as I have never used it. I would be interested to know what is the best way of reassembling data spatially from the summary output of StatisticsCalculator - might be useful in the future.
A possible way is: add a FeatureMerger; send all the original points to the Requestor port; send the Summary feature to the Supplier port; merge the Summary (Supplier) to Requestor features joining on "rank" (Requestor) to "rank._min" (Supplier), grouping by "class".
You can get the minimum rank points from the Merged port.
This method is conceptually similar to the InlineQuerier approach I posted before.
There is more than one way to skin a cat
Great to know - now the Statistics Calculator method works like a charm too!
The StatisticsCalculater+FeatureMerger method is useful when you need to respect the order of input features and/or use other features (NotMerged) in the following processes.