Question

Collapse (text) records where contiguous

  • 27 February 2014
  • 15 replies
  • 5 views

Badge
Hi,

 

 

Can anyone help me devise a method using FME to collapse a series of text records with no associate geometry, assuming that 2 values are equal and 2 values are contiguous. I have included some example data below, data is arrange such that the Animal and Colour values must match the preceding row and the preceding End value must match the Start value.

 

 

I have a very nasty script for excel which may be possibly be adapted, but I would really like to do this with FME.

 

 

I have tried to do some of this within a loop but got a bit lost. Could it be down using transformers or a Python script which can be included in a bench? Ideally any method needs to be adaptable as I would like to use it for other data sets with differing field names and more or less matches required.

 

 

 

Animal, Text

 

Colour, Text

 

Start, Int

 

End, Int

 

 

Input Data Format

 

 

Animal,Colour,Start,End

 

Dog,Blue,0,1

 

Dog,Blue,1,5

 

Dog,Blue,5,10

 

Dog,Red,31,35

 

Dog,Red,35,56

 

Cat,Purple,101,102

 

Cat,Purple,103,106

 

Cat,Purple,106,111

 

 

Required Output Data Format

 

 

Animal,Colour,Start,End

 

Dog,Blue,0,10

 

Dog,Red,31,56

 

Cat,Purple,101,102

 

Cat,Purple,103,111

 

 

 

Thanks in Advance,

 

 

Rob

15 replies

Userlevel 2
Badge +17
Hi Rob,

 

 

There are several ways. One is:

 

 

1) Aggregator

 

Group By: Animal, Colour

 

Keep Input Attributes: No

 

List Name: _list

 

 

2) ListIndexer

 

List Attribute: _list{}

 

List Index: 0

 

Demoted Attribute Prefix: first_

 

 

3) ListIndexer_2

 

List Attribute: _list{}

 

List Index: -1    (-1 indicates the last element in the list)

 

Demoted Attribute Prefix: last_

 

 

4) AttributeRenamer

 

Old Attribute  |  New Attribute

 

first_Start  |  Start

 

last_End  |  End

 

 

Takashi
Userlevel 2
Badge +17
# re-post. correction for typos...

 

 

Another approach. Maybe this is simpler and more general.

 

 

1) StatisticsCalculator (Summary)

 

Group By: Animal, Colour

 

Attributes to Analyze: Start, End

 

 

2) AttributeRenamer

 

Old Attribute  |  New Attribute

 

Start._min  |  Start

 

End._max  |  End
Badge
Hi Takashi,

 

 

As always thanks for the help. I have implemented the first solution using the Aggregator and the List Indexers and it has nearly worked. However, in the example above some successive records may match in terms of Cat and Purple, but there may be gaps in terms of successive  End and Start values (which I want to retain). When I used the min and max (0 and -1) indexes the gap is effectively closed or ignored. Do you think there is a way to account for this?

 

 

Cat,Purple,101,102 (gap for next start)

 

Cat,Purple,103,111

 

 

Regards,

 

 

Rob

 

 

 

Userlevel 2
Badge +17
Ah, I missed that condition.

 

How about creating a group ID by comparing Start of current row to End of previous row?

 

 

1) AttributeCreator

 

Attribute Name: GroupID

 

Value: <any>

 

Just create the attribute name. It can be replaced with the AttributeExposer.

 

 

2) AttributeCreator_2

 

Check Multiple Feature Attribute Support option.

 

Number of Prior Features: 1

 

If Attribute is Missing, Null or Empty: Use Other Value

 

Attribute Replacement Value: 0

 

-----

 

Attribute Name: GroupID

 

Value: 2 Possible Values (Conditional Value setting)

 

If @Value(Start) = @Value(feature[-1].End) Then feature[-1].GroupID

 

Else @Evaluate(@Value(feature[-1].GroupID) + 1)

 

 

3) StatisticsCalculator (Summary)

 

Group By: Animal, Colour, GroupID

 

Attributes to Analyze: Start, End

 

 

4) AttributeRenamer

 

Old Attribute  |  New Attribute

 

Start._min  |  Start

 

End._max  |  End
Badge
Hi Takashi,

 

 

Thanks for the revised settings, I managed to complete step 1 and then got stuck.

 

 

For info I am using FME2013.

 

 

I started on step 2 but could not find the check Multiple Feature Att support and the other settings i.e/ # prior features. (is this avaiable in 2013 or am is it user error (most likely!).

 

 

Also I tried to enter the conditional value setting but did not manage to enter teh setting in correctly.

 

 

Sorry for being a pain but could you  include a screen shot/s of this bit. when i have done it once I will be ok.

 

 

Regards,

 

 

Rob

 

 

Userlevel 2
Badge +17
Multiple Feature Attribute Support was added in FME 2013 SP2 (or SP1?).

 

Well, replace two AttributeCreators with a PythonCaller.

 

Attributes to Expose: GroupID

 

-----

 

import fmeobjects

 

class GroupIdSetter(object):

 

    def __init__(self):

 

        self.groupId = 0

 

        self.prevEnd = 0

 

 

    def input(self, feature):

 

        start = int(feature.getAttribute('Start'))

 

        if start != self.prevEnd:

 

            self.groupId += 1

 

        feature.setAttribute('GroupID', self.groupId)

 

        self.prevEnd = int(feature.getAttribute('End'))

 

        self.pyoutput(feature)

 

 

    def close(self):

 

        pass

 

-----
Badge +3
Hi,

 

 

I made a little workspace doing this.

 

 

Il put it on chatter
Badge +3
Uploaded on chatter,

 

 

its a tiny piece of tcl, and mostly transformers, in fme 2013.
Badge +3
..ok what it does.

 

 

listbuilder on col0 an dcol1 (Animal and Color)

 

2 listconcatenators. 1 on col2 and 1 on col3 (start, end).

 

 

Then i use a little tcl to remove the intersection:

 

In tcl caller

 

 

 

Tcl Expression: listcompare @Value(Start) @Value(End)

 

Source Code:

 

proc listcompare {a b} {

 

 

  set a [split $a ","]

 

  set b [split $b ","]

 

  set a_no_intersect {}

 

  set b_no_intersect {}

 

  foreach i $a {

 

    if {[lsearch -exact $b $i]==-1} {

 

      lappend a_no_intersect $i}

 

       }

 

    foreach i $b {

 

    if {[lsearch -exact $a $i]==-1} {

 

      lappend b_no_intersect  $i}

 

       }   

 

 

 

  FME_SetAttribute Start_ $a_no_intersect

 

  FME_SetAttribute End_ $b_no_intersect

 

 

}

 

 

 

 

Attributesto expose: .Start_ End_

 

 

 

then 2 attributesplitters on Start_  and End_,

 

listelementcounter on Start_ (or End_),

 

attributecreator: index 0

 

custom transformer to read out the 2 attributes:
  •   2 listindexers on Start_ and End_ with Listindex : Index
  • test on index.
  • Tester:

     

    the trick here is to attach the passed and failed from the tester to output.

     

    failed and raising index you must attach to loop to input.
  • so the tester fail ports gopes to output and loop to input. If u dont it will only output the last element in the list.
matcher : on col0 trough 3. (because the loop proces dups)

 

 

 

i put it on chatter anyways..

 

 

haves funs!

 

 

 

Gio

 

Userlevel 2
Badge +17
I also created a workspace example with FME 2012 SP4.

 

https://safecommunity.force.com/068a000000456Dq

 

Badge
Hi Takashi,

 

 

Thanks very much for the worked examples I will look at these over the weekend.

 

 

I checked and I do have 2013 SP1.

 

 

Also i tried to set up the pythoncaller with teh script last night and kept getting an error SYMBOL_NAME requires arguement; there was clearly something that I had missed. However the worked exmaple you have sent will be great to work through and i am sure that i will be able to follow the steps.

 

 

Thanks again for taking the time to construct this for me.

 

 

On a related note I reworked a workbench a few weeks ago and used SQL to preform a join between query which you helped with, this had previously been based on feature merge and then tests. i implemented the SQL version it took 6 minutes instead of the FM version whcih took 2Hrs and 5mins. A significant improvement.

 

 

Best wishes,

 

 

Rob

 

 

 

 

 

 
Badge
Hi Gio,

 

 

Thanks very much for sending the TCL and list concat solution via chatter, I will look at this over the weekend.

 

 

I love that fact that you can obtain the same result within FME but use completely differing ways of implementing a solution.

 

 

 

Best wishes,

 

 

Rob
Userlevel 2
Badge +17
Hi Rob,

 

 

I updated the example, added full Python edition. You can download ver.2 from the same URL.

 

 

I expected that the SQL would be faster, but couldn't imagine such a significant improvement. The information will be very helpful for me.

 

Thank you for the notification.

 

 

Have a nice weekend.

 

 

Takashi
Badge
Hi Takashi,

 

 

Many thanks for sending through v7 with all the additional methods for resolving this issue I will send some further reviewing the alternate methods, and select the most appropriate based on my service pack type.

 

 

I spent some time working through the 2 methods that you sent on Friday; due to my SP issues I implemented the variable based version, it worked a treat. Once I understood the process flow, I tested if I could add in another variable/condition which much be met.

 

 

(See below) I Included an extra variable (vPrevHat) and test condition (vPrevEnd = Start  AND vPrevHat = Hat) which much be met in order to collapse the record, it worked well.

 

 

 

Input Data Format

 

Animal,Colour,Start,End,Hat

 

Dog,Blue,0,1,Tophat

 

Dog,Blue,1,5,Tophat

 

Dog,Blue,5,10,Fedora

 

Dog,Red,31,35,Baseball Cap

 

Dog,Red,35,56,Baseball Cap

 

Cat,Purple,101,102,Fez

 

Cat,Purple,103,106,Fez

 

Cat,Purple,106,111,Fez

 

 

 

Output Data Format

 

Animal,Colour,Start,End

 

Dog,Blue,0,5,Tophat

 

Dog,Blue,5,10,Fedora

 

Dog,Red,31,56,Baseball Cap

 

Cat,Purple,101,102,Fez

 

Cat,Purple,103,111,Fez

 

 

 

Depending on the compatibility of the new methods with my SP I may change, it is good to see the number of approaches which can be taken to solve a problem.

 

 

As always thanks again.

 

 

Best wishes

 

 

Rob
Userlevel 2
Badge +17
Good to hear you got your own solution.

 

This is an interesting subject, was also a good material for my own self training. To Think of multiple approaches for the same goal is always a good training.

 

Cheers!

Reply