Skip to main content
Question of the Week

This question was asked by @dpkonofa who wonders about how to divide data up inside a workspace, when you don't know what those divisions might be in advance.

Q) Is there a transformer that will split a table based on the value of a field and then let me do different processing for each output? I looked around and saw a few mentions of a "Fanout" but that looks to happen on the writer.

It seems that the "AttributeFilter" needs to know the values in advance. In my case, there could potentially be thousands of values and I wouldn't know what those are ahead of time.

This workflow needs to filter records based on that value and then perform a function on each classification.

A) I would call this a "dynamic" process because we don't know the values of the data in advance.

 

We can certainly handle data dynamically on a writer (the Fanout is just one aspect of that) but how can we set up multiple outputs from a transformer without doing it manually before we run the workspace?

The answer is to think of it in a different way; not to split the data throughout the workspace, but to split it at each step, in each transformer.

The way you do that is - as @jdh points out - with a group-by parameter.

Can a Workspace Be Dynamic?

Inside a workspace, if we hard-code parameters, then the workspace is static. For example, I place an ExpressionEvaluator transformer to multiply ten by ten (10x10).

However, most transformers allow the substitution of hard-coded values by an attribute value, which makes the process more dynamic because we don't know the values in advance.

But Filter transformers are quite different. The condition is dynamic but the output ports aren't. You can't filter data into different streams without defining those streams in advance.

So how else can we filter data dynamically? By forming it into groups in each transformer, using a group-by parameter.

A Simple Example

If it's not going too slow, let's prove the point with a simple example.

Here I calculate the average number of visitors per park. If I want to do that for each neighbourhood, how could I do that? Like this?

That'll work, but it's not dynamic because the AttributeFilter output ports aren't flexible. Besides, duplicating the StatisticsCalculator again and again is definitely not good practice.

So instead we simply set the Group-By to NeighborhoodName:

It's just as if the data had been split up and processed separately. And it's dynamic because even if a new neighbourhood is added, it'll automatically form its own group.

What About Transformers Without a Group-By?

That's a natural question. If there is no group-by then how do I cause the features to be filtered into groups?

Well, from basic FME training we know there are feature-based and group-based transformers.

The group-based transformers are the ones with the group-by parameter because they are the ones where the data has to be filtered into groups. Without filtering the data gets mushed together, like the above example in its original form.

Feature-based transformers don't have a group-by parameter, but that's OK because the features don't interact with each other.

Let's take the example of calculating the average area of each park:

The AreaCalculator doesn't have a Group-By, but it doesn't need one. Each feature is measured separately, so there's no need to create separate groups. The results are the same anyway.

Anything Else?

Yes. If you really, really want to process all of your data in separate streams, for a given set of transformers, then what you need to do is wrap them up inside a custom transformer.

If I wrap up the AreaCalculator and StatisticsCalculator (in the above example) inside a custom transformer, then I can expose a group-by option on the custom transformer:

This way the Group-By parameter becomes on that custom transformer. So if I pick NeighborhoodName as the attribute to group-by, each set of data is filtered separately through a separate "run" or "instance" of that transformer. So each group does pass through the AreaCalculator separately.

Additionally, on rare (very rare) occasions, you want to make a feature-based transformer operate on groups. For example, the HTMLLayouter doesn't have a group-by parameter, but you might want to make it work with groups. The PythonCaller is another example.

Wrapping it into a custom transformer gives those transformers grouping capabilities.

Summary

In short, I think this is one of the ways you have to change thinking when you use FME; especially if you've come from a software developer background.

Features don't flow and operations don't work the same way as in a programming language. What I'd like to do in the near future is to create a software developer's guide to FME, to help folk transition.

What is the equivalent to a class or method in FME? How do you create loops or other conditional structures? The answer to most of those questions is, "you don't need to"; but it's hard to discover that without some helpful examples.

So keep an eye out for that, and let me know if that would be useful to you.

Other Notable Questions

Just a couple this week, because it's a bit late! Hey, it's Friday evening you know.

  • Remaking web connections on a new FME installation
    • Why, asks @reb, do I need to remake all those connections when I install a new FME? Won't they just get passed through in the workspace? Well, the answer is that if connections could be passed like that, then your database (or web) username and passwords would get included with your workspaces and could get passed on to almost anyone. This way they are secure.
    • You can - of course - also export the connections and re-import them. I showed how to do that in a recent video that you can find on YouTube:

 

 

  • ANSI character encoding in the Shapefile writer
    • Where, asks @ebygomm, did the ANSI encoding option go in the Shapefile writer, and what do I replace it with? Well, as far as I can tell, ANSI just meant "system encoding", so use either the known system encoding or your default one, and you should get the same results as before.

Give some love to https://knowledge.safe.com/content/idea/38585/group-by-parameter-on-the-pythoncaller.html for adding a group-by to the pythonCaller.


Can't wait for "A software developer's guide to FME"!! We've been struggling to get in-house code wranglers to change their thinking and embrace FME for certain tasks.


Reply