Solved

Insert a loop?

  • 19 December 2019
  • 8 replies
  • 9 views

The attached workspace contains of two parts.

 

In part 1 I transform an excel document which contains many records of clients and products. Some clients have had only one product, others might have had 10. The result is also an excel document in which each client has only one row. In the row the order of al te products is displayed in time starting with the first product and ending with the last product (in time).

In part 2 some checks are done on the data. For example some products of the same categorie may not be handed out within a short time period.

 

The workspace does the trick, but is quite slow. I think things could be made more efficient using loops. I've done some research on the internet but can't find the solution yet. Any help would be appreciated.

icon

Best answer by mark2atsafe 19 December 2019, 18:09

View original

8 replies

Userlevel 4
Badge +25

I'm taking a look to see what might be improved, but I'm 99% sure the answer isn't loops! I think it's all the FeatureMergers and perhaps there is a way to avoid having so many of them. I'll check.

Userlevel 4
Badge +25

One quick suggestion: can you replace a whole bunch of Testers (at the start of the second section) with one TestFilter? Like this:

Currently all those multiple connections are duplicating the data once per Tester transformer (ie 16 times). If you can replace it with a TestFilter as above, then you are only working with the one set of data. It all depends on whether those tests are unique, or whether the same records might pass through multiple Testers (I don't think so, but I can't be 100% sure).

In case it helps, I modified the workspace and it is here: 32933-analysefinancien-mark1.fmw

That should save some time at least.

Userlevel 4
Badge +25

Another fix you could do, move the DateTimeCalculator/Tester transformers to before the BulkAttributeRenamers:

Currently you are using the BulkAttributeRenamer on features and then filtering them out with a Tester. It would be more efficient to filter out that data first.

It might not be a huge difference - it depends on how many features are Failed - but it will be some help.

Userlevel 4
Badge +25

OK, so the StatisticsCalculator and FeatureMergers...

I'm thinking it finds the minimum (earliest) date and adds 1_ to the attribute name. Then it finds the new minimum date and adds 2_ to the attribute, etc.

You continue this for a fixed number of times. Maybe you don't know how many times, so you add more than is necessary.

I can see why you would go to a loop, because then you just have one StatisticsCalculator and pass through it multiple times. However, loops are tricky (especially with group-based transformers) and I think there are simpler and more efficient ways.

For example:

  • Aggregator (Group by Datum)
  • Sorter (Sort by Datum)
  • Counter
  • Deaggregator (Deaggregate just the one level)
  • BulkAttributeRenamer (add @Value(_count) as the prefix)

In other words, sort the data by datum (date?) order and create a group ID for each different datum. Unfortunately there isn't a transformer to create a unique ID per group, so we have to do a bit of a workaround.

Another possible sequence would be:

  • Sorter (Sort by Datum)
  • DuplicateFilter (filter by Datum)
  • DuplicateFilter:Unique -> Counter
  • FeatureMerger (DuplicateFilter:Duplicate > Requestor, Counter > Supplier) Join by Datum
  • BulkAttributeRenamer (add the count attribute as the prefix again)
    • You'll need to connect Merged, UsedSupplier, and UnusedSupplier ports to the BulkAttrRenamer

Anyway, I hope this helps. It should be much quicker.

Userlevel 4
Badge +25

OK, so the StatisticsCalculator and FeatureMergers...

I'm thinking it finds the minimum (earliest) date and adds 1_ to the attribute name. Then it finds the new minimum date and adds 2_ to the attribute, etc.

You continue this for a fixed number of times. Maybe you don't know how many times, so you add more than is necessary.

I can see why you would go to a loop, because then you just have one StatisticsCalculator and pass through it multiple times. However, loops are tricky (especially with group-based transformers) and I think there are simpler and more efficient ways.

For example:

  • Aggregator (Group by Datum)
  • Sorter (Sort by Datum)
  • Counter
  • Deaggregator (Deaggregate just the one level)
  • BulkAttributeRenamer (add @Value(_count) as the prefix)

In other words, sort the data by datum (date?) order and create a group ID for each different datum. Unfortunately there isn't a transformer to create a unique ID per group, so we have to do a bit of a workaround.

Another possible sequence would be:

  • Sorter (Sort by Datum)
  • DuplicateFilter (filter by Datum)
  • DuplicateFilter:Unique -> Counter
  • FeatureMerger (DuplicateFilter:Duplicate > Requestor, Counter > Supplier) Join by Datum
  • BulkAttributeRenamer (add the count attribute as the prefix again)
    • You'll need to connect Merged, UsedSupplier, and UnusedSupplier ports to the BulkAttrRenamer

Anyway, I hope this helps. It should be much quicker.

For your reference (and mine) there are two issues filed with our developers to create a simpler scenario for group ID numbers: FMEENGINE-48542 and FMEENGINE-48787. I added a link to this thread to 48542, so that the developers could see the need for this improvement.

Thanks for all the suggestions. I'll dive into it to see if i can make the workspace more efficient. I'll get back when i have results (probably some where next week).

I think i've got the first part figured out with two question remaining. In the attached workspace i've managed to get the results as I had in the first workspace, but much faster. The two remarks I have however are:

-There are stil many featuremergers used. I don't think this can be done any more efficient, is that right?

-For my output I have to manually write all the colum names. Because the Bulkattributerenamer have given the attributes a prefix (1,2,3 etc.) FME does not automatically recognise the new attribute names. When i go for the automatic attribute definition it reads Clientnummer instead of 1Clientnummer; 2Clientnummer; 3Clientnummer etc. How can i make FME recognise the new attribute names this is also of importance for the second part of the workspace.

 

analysefinancien2.fmw

I've managed to solve the problem by inserting a AttributeExposer. The question is now resolved, thanks for the help.

Reply