Skip to main content

Hi,

I don't have a specific problem that I'm trying to resolve, I'm just looking for opinions/discussion on alternatives. More of a "poll" on what types of solutions people might be currently using...

The SchemaMapper transformer can be very handy but it definitely has it's limitations and it feels like it's getting a bit long in the tooth. I don't think there have been any significant changes made to it in several years, so I was wondering if people are still using it, or what design patterns might be used as an alternative?

Some of the limitations of the existing SchemaMapper transformer that I'm aware of are:

  • No ability to parameterize input parameters, esp. the schema mapping file
  • No way to perform complex "and/or" conditional mapping
  • No way to use conditional operators other than "=" in mapping rules (e.g. <,>,!=, in, like, etc.)
  • No way to map if null/empty/missing

Aside from manually performing these mappings in the workspace with transformers, what other techniques are people using? Are Python scripts the main viable alternative? Are there some newer transformers that make light of this type of work?

Looking forward to hearing your thoughts!

Nic

I haven't used the SchemaMapper in a long while, mostly because it's not always as flexible as I'd like and I find it easier and faster to get the exact behavior I want using Python.

A pattern I've used successfully a few times have been a PythonCaller with a class method. Typically, I'll read the contents of a CSV or similar rules file in the __init__() method and assign it to an instance dict object. In the input() method I'll query the feature for attributes and apply rules using the instance dict.

This gets me very good performance and a lot of flexibility in e.g. using regex-based rules etc that aren't supported by the SchemaMapper.


I have used the Joiner transformer quite a bit. Join the information from the Excel/CSV file to the features and work from there to get the required output.

Probably not as fast as the Python solution, but it does not require Python experience (and that is not my strongest point).

Probably not as good with complex schema mapping either.


I have used the Joiner transformer quite a bit. Join the information from the Excel/CSV file to the features and work from there to get the required output.

Probably not as fast as the Python solution, but it does not require Python experience (and that is not my strongest point).

Probably not as good with complex schema mapping either.

Good suggestion. If you use the prefetch sql wisely and the mapping is relatively simple, I'm sure it can be just as fast as a Python solution.

I have used the Joiner transformer quite a bit. Join the information from the Excel/CSV file to the features and work from there to get the required output.

Probably not as fast as the Python solution, but it does not require Python experience (and that is not my strongest point).

Probably not as good with complex schema mapping either.

Thanks. As David mentioned, I think this can be a good approach if the mappings are relatively simple. The (Database)Joiner only allows for AND type conditions but it's still a potentially simpler and more flexible approach than the SchemaMapper.

 

 

pFor what it's worth, I'm not sure I like the new name DatabaseJoiner as it's not clear that this can be used to join to non-database formats like Excel.]

I haven't used the SchemaMapper in a long while, mostly because it's not always as flexible as I'd like and I find it easier and faster to get the exact behavior I want using Python.

A pattern I've used successfully a few times have been a PythonCaller with a class method. Typically, I'll read the contents of a CSV or similar rules file in the __init__() method and assign it to an instance dict object. In the input() method I'll query the feature for attributes and apply rules using the instance dict.

This gets me very good performance and a lot of flexibility in e.g. using regex-based rules etc that aren't supported by the SchemaMapper.

Thanks David. I've used both the Python and the SchemaMapper approach recently. I was actually surprised at how much I could get done with the SchemaMapper but it does have its limitations. The PythonCaller offers a lot more flexibility and I like your idea of being able to apply regex-based rules. Unfortunately, as erik_jan mentioned, not everyone is comfortable with Python.

 

 


Thanks. As David mentioned, I think this can be a good approach if the mappings are relatively simple. The (Database)Joiner only allows for AND type conditions but it's still a potentially simpler and more flexible approach than the SchemaMapper.

 

 

pFor what it's worth, I'm not sure I like the new name DatabaseJoiner as it's not clear that this can be used to join to non-database formats like Excel.]
ExternalJoiner? (We're gearing up to introduce a FeatureJoiner which will be a reworked FeatureMerger using SQL terminology and <hopefully> be an order or two of magnitude faster...with an eye on oneday removing the word Feature off the front of everything...)

 

 


I do like using the DatabaseJoiner for schema mappings -- that was the way years ago we used to do it. And the prefetch was created for that type of scenario. (Well, at for reading MGE data but I digress.) SchemaMapper is fancier and more powerful for sure, but at the expense of more complexity. We could take a deep breath and revisit the SchemaMapper interface and underlying code one more time but I'm not convinced it is time well spent -- unless we get enhancement suggestions that are quite simple to understand (like introducing more operators than implied equality -- < > regexp etc). Perhaps an Idea could be filed and then we can collect thoughts and votes over there?


ExternalJoiner? (We're gearing up to introduce a FeatureJoiner which will be a reworked FeatureMerger using SQL terminology and <hopefully> be an order or two of magnitude faster...with an eye on oneday removing the word Feature off the front of everything...)

 

 

Maybe DataJoiner or DataMerger? I like the idea of an updated/enhanced FeatureMerger.

 

 


I do like using the DatabaseJoiner for schema mappings -- that was the way years ago we used to do it. And the prefetch was created for that type of scenario. (Well, at for reading MGE data but I digress.) SchemaMapper is fancier and more powerful for sure, but at the expense of more complexity. We could take a deep breath and revisit the SchemaMapper interface and underlying code one more time but I'm not convinced it is time well spent -- unless we get enhancement suggestions that are quite simple to understand (like introducing more operators than implied equality -- < > regexp etc). Perhaps an Idea could be filed and then we can collect thoughts and votes over there?

I'm just in the initial stages of a thought process of how this could be enhanced. Unfortunately, I'm going to be off work (back in Canada) for the next month, so it might all be forgotten by the time I return but I'd like to continue digging into this... If I come up with something useful I'll let you know.

 

 


Thanks David. I've used both the Python and the SchemaMapper approach recently. I was actually surprised at how much I could get done with the SchemaMapper but it does have its limitations. The PythonCaller offers a lot more flexibility and I like your idea of being able to apply regex-based rules. Unfortunately, as erik_jan mentioned, not everyone is comfortable with Python.

 

 

I agree, Python isn't everybody's cup of tea and that's of course ok. But as they say, when you have a really big hammer, every problem tends to look like a nail 🙂

Hi @nic_ran​ , @erik_jan​  and @david_r​ !

 

We're revisiting the SchemaMapper over here, to see how we might be able to shave down it's long tooth. In digging into this transformer, it's evident there are a lot of questions about what exactly a new-n-improved SchemaMapper looks like. We're beginning to look at supporting simple functions and conditionals but still need to investigate/verify the following:

  • Which functions and conditionals would be used? What would be helpful?
  • Would these statements be specified in the external lookup table, the configuration fields, or apply to the incoming feature values themselves?
  • What kind of UI help would the author expect for writing these statements?

Any and all feedback welcomed on these above points, if you would be game to share! I would greatly appreciate it!

 

Cheers,

 

Annabelle


Hi @nic_ran​ , @erik_jan​  and @david_r​ !

 

We're revisiting the SchemaMapper over here, to see how we might be able to shave down it's long tooth. In digging into this transformer, it's evident there are a lot of questions about what exactly a new-n-improved SchemaMapper looks like. We're beginning to look at supporting simple functions and conditionals but still need to investigate/verify the following:

  • Which functions and conditionals would be used? What would be helpful?
  • Would these statements be specified in the external lookup table, the configuration fields, or apply to the incoming feature values themselves?
  • What kind of UI help would the author expect for writing these statements?

Any and all feedback welcomed on these above points, if you would be game to share! I would greatly appreciate it!

 

Cheers,

 

Annabelle

Hi Annabelle, it's been many years since I last used the SchemaMapper, but my primary reason for stopping to use it was that all the settings were hard-coded into the workspace, there was no easy way to use it in dynamic workspaces where some settings or conditionals would depend on the input. Also, the wizard interface was clunky to work with.

Finally I seem to remember that it was fairly slow for large-ish data volumes, but I could be wrong, so please take it with a grain of salt.

My typical use case would be for translating attribute names before output using dynamic schema. For example, when reading from a database you get the technical column names back, but often the users want something more "humanly readable" in their output file, so if they could maintain e.g. an Excel sheet with translations that would result in renaming both the schema feature and the actual data features, that would be amazing. As it is, I have to resort to Python to accomplish this.

Regarding the GUI, my pet peeve is when you just want to adjust a small setting in an existing transformer, but it then requires access to the full dataset before you can accept the changes (e.g. the FeatureReader). If that can be avoided, it would be very helpful.

I hope this helps.


Hi Annabelle,

like David, it's been a number of years since I've used the SchemaMapper and the last time I had a need for it I actually wrote my own version because the standard SchemaMapper couldn't do what I needed it to. My general opinion of the current version is that it's too complicated for the average user to bother with but not powerful enough for advanced users.

The main issues I had with it at the time I last used it I think I've covered in my original post asking for feedback, i.e. you can't do conditional mapping with multiple operators (I'd like to see capabilities similar to the Tester transformer), and you can't parameterise the input. Both of those shortcomings relate to the functionality of the mapping rules and limitations on what can actually be achieved in schema mapping with the transformer. There is also no way in the SchemaMapper to map features based on a spatial predicate test (e.g. spatial intersect, area, length, etc.).

I fully understand that a schema mapper with the above level of functionality significantly blurs the line between functionality that could be encapsulated in a rules database and what can be done directly in an FME workspace - and where do you draw the line - but it could certainly pave the way toward some very generic workspaces.

You do raise another good point, though. The interface is not easy to use or understand. I very much like the idea of an external database to hold the schema mapping information, as this can help keep the mapping logic editable without requiring FME knowledge (e.g. by a business analyst). I think there's definitely room for improvement in the existing interface in this regard.

It might also be worth considering splitting the SchemaMapper into 2 separate transformers:

  1. external database rules (as in the existing SchemaMapper)
  2. fully inline transformer that allows rules to be built within the workspace (ideally, making use of cached data values)

That's all I've got off the top of my head right now but I may add more detail later, once I've had a chance to review my old work on this.

Nic


Reply