Question of the Week: Splitting Data Dynamically Inside a Workspace

Question

Question of the WeekThis question was asked by @dpkonofa who wonders about how to divide data up inside a workspace, when you don't know what those divisions might be in advance.Q) Is there a transformer that will split a table based on the value of a field and then let me do different processing for each output? I looked around and saw a few mentions of a "Fanout" but that looks to happen on the writer.It seems that the "AttributeFilter" needs to know the values in advance. In my case, there could potentially be thousands of values and I wouldn't know what those are ahead of time.This workflow needs to filter records based on that value and then perform a function on each classification. A) I would call this a "dynamic" process because we don't know the values of the data in advance. We can certainly handle data dynamically on a writer (the Fanout is just one aspect of that) but how can we set up multiple outputs from a transformer without doing it manually before we run the workspace?The answer is to think of it in a different way; not to split the data throughout the workspace, but to split it at each step, in each transformer. The way you do that is - as @jdh points out - with a group-by parameter.Can a Workspace Be Dynamic?Inside a workspace, if we hard-code parameters, then the workspace is static. For example, I place an ExpressionEvaluator transformer to multiply ten by ten (10x10).However, most transformers allow the substitution of hard-coded values by an attribute value, which makes the process more dynamic because we don't know the values in advance. But Filter transformers are quite different. The condition is dynamic but the output ports aren't. You can't filter data into different streams without defining those streams in advance.So how else can we filter data dynamically? By forming it into groups in each transformer, using a group-by parameter.A Simple ExampleIf it's not going too slow, let's prove the point with a simple example.Here I calculate the average number of visitors per park. If I want to do that for each neighbourhood, how could I do that? Like this?That'll work, but it's not dynamic because the AttributeFilter output ports aren't flexible. Besides, duplicating the StatisticsCalculator again and again is definitely not good practice.So instead we simply set the Group-By to NeighborhoodName:It's just as if the data had been split up and processed separately. And it's dynamic because even if a new neighbourhood is added, it'll automatically form its own group.What About Transformers Without a Group-By?That's a natural question. If there is no group-by then how do I cause the features to be filtered into groups?Well, from basic FME training we know there are feature-based and group-based transformers. The group-based transformers are the ones with the group-by parameter because they are the ones where the data has to be filtered into groups. Without filtering the data gets mushed together, like the above example in its original form.Feature-based transformers don't have a group-by parameter, but that's OK because the features don't interact with each other. Let's take the example of calculating the average area of each park:The AreaCalculator doesn't have a Group-By, but it doesn't need one. Each feature is measured separately, so there's no need to create separate groups. The results are the same anyway.Anything Else?Yes. If you really, really want to process all of your data in separate streams, for a given set of transformers, then what you need to do is wrap them up inside a custom transformer.If I wrap up the AreaCalculator and StatisticsCalculator (in the above example) inside a custom transformer, then I can expose a group-by option on the custom transformer:This way the Group-By parameter becomes on that custom transformer. So if I pick NeighborhoodName as the attribute to group-by, each set of data is filtered separately through a separate "run" or "instance" of that transformer. So each group does pass through the AreaCalculator separately.Additionally, on rare (very rare) occasions, you want to make a feature-based transformer operate on groups. For example, the HTMLLayouter doesn't have a group-by parameter, but you might want to make it work with groups. The PythonCaller is another example.Wrapping it into a custom transformer gives those transformers grouping capabilities.SummaryIn short, I think this is one of the ways you have to change thinking when you use FME; especially if you've come from a software developer background.Features don't flow and operations don't work the same way as in a programming language. What I'd like to do in the near future is to create a software developer's guide to FME, to help folk transition. What is the equivalent to a class or method in FME? How do you create loops or other conditional structures? The answer to most of those questions is, "you don't need to"; but it's hard to discover that without some helpful examples. So keep an eye out for that, and let me know if that would be useful to you.Other Notable QuestionsJust a couple this week, because it's a bit late! Hey, it's Friday evening you know.Remaking web connections on a new FME installationWhy, asks @reb, do I need to remake all those connections when I install a new FME? Won't they just get passed through in the workspace? Well, the answer is that if connections could be passed like that, then your database (or web) username and passwords would get included with your workspaces and could get passed on to almost anyone. This way they are secure.You can - of course - also export the connections and re-import them. I showed how to do that in a recent video that you can find on YouTube:  ANSI character encoding in the Shapefile writerWhere, asks @ebygomm, did the ANSI encoding option go in the Shapefile writer, and what do I replace it with? Well, as far as I can tell, ANSI just meant "system encoding", so use either the known system encoding or your default one, and you should get the same results as before.

jdh · Answer

Give some love to https://knowledge.safe.com/content/idea/38585/group-by-parameter-on-the-pythoncaller.html   for adding a group-by to the pythonCaller.

Question of the Week: Splitting Data Dynamically Inside a Workspace

2 replies

Reply

Helpful Members This Week

Recently Solved Questions

find out, in which workspaces a certain deployment parameter is being used

Update version 2025

Unable to connect ESRI Geodatabae (File Geodb) reader between FME Flow 2025.0.3 and ArcGIS Pro 3.5.0

FME Flow webhooks not firing from survey123 mobile app, only from desktop app

PointOnAreaOverlayer - attributes are handled differently on 2025 version

Community Stats

Latest FME

Cookie policy

Cookie settings

Reply

Related Topics

FindRegEx - Going backwardsicon

Hi there, today I have a question about the SchemaMapper lookup table. The SchemaMapper filter allows me to define conditional clauses to perform attribute mappings based on specific conditions.icon

Batch deploy into DWG formaticon

FME arranging transformers - best practiceicon

Distance markers from centre of shapeicon

Helpful Members This Week

Recently Solved Questions

find out, in which workspaces a certain deployment parameter is being used

Update version 2025

Unable to connect ESRI Geodatabae (File Geodb) reader between FME Flow 2025.0.3 and ArcGIS Pro 3.5.0

FME Flow webhooks not firing from survey123 mobile app, only from desktop app

PointOnAreaOverlayer - attributes are handled differently on 2025 version

Popular Tags

Community Stats

Latest FME

Sign up

An FME Account is required to contribute

Login to the community

An FME Account is required to contribute

Scanning file for viruses.

This file cannot be downloaded

Cookie policy

Cookie settings