Solved

Find highest value (most recent year) for otherwise duplicate records


Badge

I have a MapInfo table with attributes Species, GridRef, Year (such as 2020, 1999..). There may be many records where the Species and GridRef are the same, but I want to filter and retain only those with the most recent Year - so only the most recent record for a Species at each GridRef. There can also be many where Species, GridRef and Year are the same, but I just want one of those with the most recent year (it wouldn't matter which).

I think by using a combination of Sorter and Sampler I should be able to sort by Name, GridRef, Year and group by these in the Sampler - so it should pick the first record in each group? I tried this but still seem to be getting duplicate Species and GridRefs in the result.

I'd be very grateful if someone could at least point me to the right transformer. or give me details about the parameters I should be using in the Sampler.

Many thanks, and keep safe,

Jane

icon

Best answer by ebygomm 4 May 2020, 19:36

View original

4 replies

Userlevel 1
Badge +21

If you want to keep only the latest year then you need to drop the Year from the group by. At the moment your group by will be returning one record for each unique combination of name, gridref and year. You only want one record for each unique combination of name and gridref, the sorter if sorted by year in descending order will ensure that it is the most recent one

Badge +2

@janem it's tricky in FME to pass a value (latest year) from one feature to another. But you're on the right track with Sorter and Sampler. First, perhaps try DuplicateFilter to drop the Duplicate Name & GridRef records using Year, Name & GridRef as the key. Then Sampler to shave off the first feature. Now you can use some conditional tests in AttributeCreator and the enable Adjacent Feature Attributes (see the section: Advanced Example: Using Multiple Feature Attribute Support) to compare the year on the previous feature with the current features year.

The workspace looks something like this:

and the adjacent feature handling:

I think there might be simpler logic for the conditional but can't get there ...

I've attached a example workspace & data (FME 2020.0): latestdate.fmwt

Badge +2

Hi @janem,

How about using a StatisicsCalculator to find the max value and a Tester to filter out any dates that do not equal the max date? Assuming you are using numeric date, the StatisticsCalculator could be used to find the max value - or for each group if desired using the Group By parameter in the StatisticsCalculator. Then you could simply use a Tester to filter for feature where the date attribute equals the date.max attribute that was produced by the StatisticsCalculator.

Input:

 

Output:

Hope that helps!

Find_most_recent.fmwt

Badge

Thank you all very much for your time. In the end I could adjust my workspace using your suggestions - I used a StringFormatter to convert the Char Year field to Integer, then Sorter to sort by Latin_Name, Grid_Ref and Year, then the Sampler, but grouping only by Grid_Ref and Latin_Name (I think adding Year was my main problem before) as below.

Many thanks again for your help,

Jane

 

Reply