Question

Compressing a file with semi-duplicates based upon longest cell value

  • 13 November 2018
  • 3 replies
  • 3 views

Badge

Hi all,

I would like to know how I can simplify some work that I do now in Excel with a very complex formula.

What I want to do, is to "compress" a table. I made an example above (not the kind of data, but the idea is simple). Code 1 is leading, and is a code for a certain spell. Value A and Value B are pieces of information that belong to Code 1, but some information is fragmented and/or absent in some rules.

What I want, is that I will get one table like this one:

In this way, I can just use the DuplicateFilter to extract just one random Code1 feature to get the compressed data set I want, including all information.

 

Does anyone have a suggestion? In Excel, I just took the longest answer for every cell and per Code 1, but I would like to do this process in FME since it will be way faster for larger datasets I assume. Besides that, my Excel formula had to be six sentences long in order to cover the right range for Code 1, consuming a lot of computation time.

 

Regards,

 

Martin


3 replies

Userlevel 4

One solution could be to use the Aggregator set to Group By on "Code 1" and to activate the "Generate list" option, adding the attributes "Value A" and "Value B".

You will then have one output feature for each distinct value in "Code 1", and on each value you will have a list containing all the different values for "Value A" and "Value B".

To find the value for "Value B" you can use a ListSorter (sort numeric descending) followed by a ListIndexer to get the top value (index 0).

For "Value A" you can do something similar, depending on what you need.

At the end you can use a Deaggregator with conflict resolution set to "Use original attribute value".

Userlevel 1
Badge +21

Broadly similar to the above, presuming Value B a 1 to 1 mapping for each code 1

Use null attribute mapper to make sure features with an empty Value B have that attribute removed, calculate the string length of value A, aggregate based on Code 1 by merging attributes and create a list of Value A with corresponding length, sort list by length and then use the first list value for Value A

Compressing_a_file.fmwt

Userlevel 4
Badge +25

It depends on the amount of data, but one speed up would be to add a DuplicateFilter to filter out features that are complete duplicates of another, before you carry out the suggested procedures.

As I understand it, you're going to be removing duplicates anyway, so you might as well get rid of some of them here. The solutions for your main issue are all going to include sorting and removing duplicates now will reduce the amount of sorting that needs to take place.

Reply