Question

FME Best Practice Validation Project (You Can Help!)


Userlevel 4
Badge +25

Hey FME'ers. I've not been online for a bit recently (because of updating the FME training materials for 2017) but I wanted to throw out a new idea that is part challenge, but part crowdsourcing.

I came up with the idea of an FME-driven service for assessing a workspace for best practices. Just now I found I wasn't the first to think of this, which shows there aren't any new ideas any more!

Anyway, in my spare time I've been putting together a workspace using the FMW reader to test other workspaces for best practice. I've built about 20 different tests and can think of quite a few more. But I thought it would be great if we - the FME community - could work together to take this idea to completion.

So, this is an open invite to take part in this project. I think there are a number of different ways you could contribute:

  • Develop a best practices test and add it to the project workspace
  • Do some testing on the project workspace to look for faults or enhancements
  • Improve the output style (it's currently fairly basic HTML)
  • Suggest some ideas for tests that we haven't yet thought of or implemented
  • Use the project to assess your own workspaces - and let us know what you think

I've shared all the files in a folder on Dropbox (now find it on GitHub instead). Anyone in the FME community is welcome to access this and use the contents for whatever you like.

If you want to contribute a test, then try to get a feel for the workspace style, pick a test that isn't done, and go for it. I haven't done any tests around transformers yet, so there is a lot still to do. And there may be reader/writer tests I haven't thought of. Preferably make a copy of the workspace, since we don't have proper revision control (yet). I'd like to keep it in 2016.1 or earlier for the moment, so no-one has to install a beta version.

I'm also open to any and all other ideas about how to go about this, and how to collaborate on a project like this. As far as I know, there's never been a crowdsourced FME project before!

My end goal is to get this online and hosted in FME Cloud, so we can make a proper web service out of it. My idea is that everyone who contributes would get recognition on the web page (and a custom KnowledgeCentre badge of course)!

So, let me know what you think - and if you want to contribute then please do so.

Mark


50 replies

Badge +21

Would be great to put this up on github so many could suggest "issues/enhancements" and perhaps someone else could implement it in FME.

Badge +21

My contribution. Updated 1.Report Header to add more information regarding the file itself.

bestpracticereportgenerator-norkart-sigbjørn.zip

Userlevel 4
Badge +25

Very interesting project Mark, here's a little contribution by me: a table with the transformer histogram. bestpracticereportgenerator-hm.zip

I haven't included the contribution by @sigtill in mine, think it would be best if we come up with a smart way of handling more conributions, GitHub seems the best way to do that.

Some other ideas:

  • Include a list of links to the subsections either before or immediately after the header to facilitate navigation.
  • Add a section that checks for startup and shutdown scripts.
Badge

Awesome idea as always. We've had this one come up as a possibility as well, I even made a little start for it. My ideas were slightly different from yours, though the checks look familiar.

One idea is to come up with an overal score, on a grade between 0 and 100, where 100 is the best workspace ever. You can make a number of categories to give points on, and you get deductions if something is 'wrong'.

At the time I made a small list of categories, I'll post them here for inspiration.

  1. Size: A certain Safer says it shouldn't be more than 10 transformers. Perhaps give 20 points at 20- Transformers, 0 points at 40+ Transformers.
  2. Commenting: A max of 20 points for comments and annotations, relative to the number of transformers. You already included this in your list.
  3. Flow: A max of 20 points for flow. Deductions for using something as evil as a Sorter of FeatureHolder.
  4. Usage: You can see the last run date in a workspace. If nobody is using it, it's possibly useless or wrong. Max 10 points.
  5. Properties: If the workspace (description, usage) are empty, it might not be clear what exactly the workspace does. Max 10 points.
  6. Up-to-date: Maximum 10 points for how up-to-date the transformers are.

This matches quite well with the list already available, but adding valuation instead of just error or warn, so you can take a quick glance to determine which ones need work, then use the warn/error information to fix specific problems.

A possible error as well: is Redicrect to Data Inspector enabled?

I'd like to join in with this because it seems like a fun idea. Regarding organization: FME workspaces are very annoying to merge, which is why putting it in version control is such a pain. I'd suggest making one master version available, moderated by @Mark2AtSafe. Provide a location where people can upload suggested changes, which @Mark2AtSafe can manually merge back into master. This version definitely needs to be in version control, so I agree with @sigtill to put this up on GitHub.

 

Badge

Great workbench. I have been testing it on some of my recent work and noticed it would be helpful to add href links to FME help documentation.

In a few areas of the report this could be applied, such as WorkspaceProperties, Annotation, Bookmarks...

Workspace Properties

Annotations Adding

Bookmarks Using

analysisreport.zip

Badge

Awesome idea as always. We've had this one come up as a possibility as well, I even made a little start for it. My ideas were slightly different from yours, though the checks look familiar.

One idea is to come up with an overal score, on a grade between 0 and 100, where 100 is the best workspace ever. You can make a number of categories to give points on, and you get deductions if something is 'wrong'.

At the time I made a small list of categories, I'll post them here for inspiration.

  1. Size: A certain Safer says it shouldn't be more than 10 transformers. Perhaps give 20 points at 20- Transformers, 0 points at 40+ Transformers.
  2. Commenting: A max of 20 points for comments and annotations, relative to the number of transformers. You already included this in your list.
  3. Flow: A max of 20 points for flow. Deductions for using something as evil as a Sorter of FeatureHolder.
  4. Usage: You can see the last run date in a workspace. If nobody is using it, it's possibly useless or wrong. Max 10 points.
  5. Properties: If the workspace (description, usage) are empty, it might not be clear what exactly the workspace does. Max 10 points.
  6. Up-to-date: Maximum 10 points for how up-to-date the transformers are.

This matches quite well with the list already available, but adding valuation instead of just error or warn, so you can take a quick glance to determine which ones need work, then use the warn/error information to fix specific problems.

A possible error as well: is Redicrect to Data Inspector enabled?

I'd like to join in with this because it seems like a fun idea. Regarding organization: FME workspaces are very annoying to merge, which is why putting it in version control is such a pain. I'd suggest making one master version available, moderated by @Mark2AtSafe. Provide a location where people can upload suggested changes, which @Mark2AtSafe can manually merge back into master. This version definitely needs to be in version control, so I agree with @sigtill to put this up on GitHub.

 

Good point about versioning of FME workspaces, I was hoping someone would have commented about that. We have versioned our projects using BitBucket. We solve the problem of merging by having a master fmw file at a designated location on a drive, which allowes only one user to access / update the file at a time. That's probably not a feasible solution with regards to crowdsourcing, though...

 

 

Off topic, but @Mark2AtSafe: are there any news about versioning / merging in FME 2017?

 

Badge

A few ideas for tests off the top of my head, not sure if it's possible to implement these :)

  • Check how long it takes to open the project/file size - or some other metric to indicate that the project is too large
  • Possibility of losing features / lack of error trapping. For instance if there excists a tester with a connection only to the Passed port.
  • Give warning if reading feature types that aren't used. Or readers that aren't used.
  • Information or warning where it would be possible to implement a where clause on a reader (for instance if there's a tester right after a reader).
  • Test if transformer names have been changed (if not, could not be best practice?)
  • See if urls respond when using the HTTP caller
  • See if connections can be made to databases and file locations using the information in the workspace
Badge +21

Would be great if this was run right after you hit "Publish to FME Server" on the File menu of FME Workbench - so you could validate it BEFORE you publish it to an FME Server. Or more in generall - add an option to run a custom workspace before "Publish to FME Server" to validate something in that particular fmw workspace - for instance that all your database-connections are connected to staging/dev/prod and with the correct user / paths etc. When uploading to multiple FME Servers dev/stage/prod this can easily be forgotten!

Badge

Very interesting project Mark, here's a little contribution by me: a table with the transformer histogram. bestpracticereportgenerator-hm.zip

I haven't included the contribution by @sigtill in mine, think it would be best if we come up with a smart way of handling more conributions, GitHub seems the best way to do that.

Some other ideas:

  • Include a list of links to the subsections either before or immediately after the header to facilitate navigation.
  • Add a section that checks for startup and shutdown scripts.
W O W

 

I've just tested your provided workspace and it is looking great!

 

Although I couldn't resist testing your workspace using your workspace :p

 

 

 

Badge

For me the most important things are linked with a 'stranger' allowing to make adaptations to your workspace without making mistakes a quickly as possible. These include:

1. Provide an annotation when making use of a special setting that might influence the flow (e.g. Suppliers first in the FeautureMerger).

2. For a transformer allowing custom functionality (SQLExecutor / InlineQuerier / PythonCaller). Provide comments in the code itself AND provide an annotation stating the functionality briefly.

3. Make sure the flow of the workspace is clear when completely zooming out.

Badge +8

Don't know if there's a tool already out there, but maybe something to take a look at log files of the workspaces that we all too commonly ignore to raise warnings about tasks that are taking above "normal" or a threshold. Brownie points for a way to compile performance stats from FME server job logs ;)

For how we're using FME Server, a workflow that runs every 10 minutes that is optimized by 1 second could save almost 2 1/2 minutes a day (or 14 1/2 hours per year).

Badge +5

Awesome idea as always. We've had this one come up as a possibility as well, I even made a little start for it. My ideas were slightly different from yours, though the checks look familiar.

One idea is to come up with an overal score, on a grade between 0 and 100, where 100 is the best workspace ever. You can make a number of categories to give points on, and you get deductions if something is 'wrong'.

At the time I made a small list of categories, I'll post them here for inspiration.

  1. Size: A certain Safer says it shouldn't be more than 10 transformers. Perhaps give 20 points at 20- Transformers, 0 points at 40+ Transformers.
  2. Commenting: A max of 20 points for comments and annotations, relative to the number of transformers. You already included this in your list.
  3. Flow: A max of 20 points for flow. Deductions for using something as evil as a Sorter of FeatureHolder.
  4. Usage: You can see the last run date in a workspace. If nobody is using it, it's possibly useless or wrong. Max 10 points.
  5. Properties: If the workspace (description, usage) are empty, it might not be clear what exactly the workspace does. Max 10 points.
  6. Up-to-date: Maximum 10 points for how up-to-date the transformers are.

This matches quite well with the list already available, but adding valuation instead of just error or warn, so you can take a quick glance to determine which ones need work, then use the warn/error information to fix specific problems.

A possible error as well: is Redicrect to Data Inspector enabled?

I'd like to join in with this because it seems like a fun idea. Regarding organization: FME workspaces are very annoying to merge, which is why putting it in version control is such a pain. I'd suggest making one master version available, moderated by @Mark2AtSafe. Provide a location where people can upload suggested changes, which @Mark2AtSafe can manually merge back into master. This version definitely needs to be in version control, so I agree with @sigtill to put this up on GitHub.

 

FeatureHolders aren't always evil and are sometimes actually required! If updating a DB table, for example, that is being accessed by a Joiner in the workspace, all features have to have finished reading from the Joiner prior to attempting to write back to the table or the translation will fail. Perhaps we need to add a check for that in the validation workspace. :)

 

 

Badge +1

Don't know if there's a tool already out there, but maybe something to take a look at log files of the workspaces that we all too commonly ignore to raise warnings about tasks that are taking above "normal" or a threshold. Brownie points for a way to compile performance stats from FME server job logs ;)

For how we're using FME Server, a workflow that runs every 10 minutes that is optimized by 1 second could save almost 2 1/2 minutes a day (or 14 1/2 hours per year).

@runnealsTo automatically look for warnings and error messages in an FME log file you might want to take a look at this custom transformer in the FME hub: https://hub.safe.com/transformers/logfilescraper

 

Userlevel 4
Badge +25

Thanks for all these responses folks. As you'll perhaps have noticed, I've been absent from the community of late, concentrating on the training updates. I'm hoping to have time to get back to this now.

In fact, I'll have to get back to this because we are planning to use it as part of a workshop at the FME User Conference. Ryan Cragg and I are going to try and cover best practice, critique folks' workspaces live, and run them through this workspace.

So... I'll run through all these ideas and suggestions shortly and get back to you all.

Regards

Mark

Userlevel 4
Badge +25

Would be great to put this up on github so many could suggest "issues/enhancements" and perhaps someone else could implement it in FME.

@sigtill @redgeographics and others - I'm just wondering about the best way to set this up on GitHub. If there are commit conflicts then it's very hard to assess what the differences are using GitHub itself. You'd need to open them in FME and do a visual inspection to see any differences (writing a DIFF tool for FME workspaces - that sounds like another challenge!)

 

 

Anyway, I think one way to help minimize that - and to make the project more manageable - is to split off each of the sections into exported custom transformers. It becomes a little harder to 'install' into FME, but I think the other advantages outweigh that.

 

 

So I'm going to do that and see if I can get it into GitHub sometime today.

 

Userlevel 4
Badge +25

OK, the project is now in GitHub at: https://github.com/safesoftware/BestPractice

I made @sigtill and @redgeographics collaborators in the project. It's an open project so anyone else can also make changes and submit pull requests, but if you want direct access (without me having to approve your commits) then let me know and I will add you as a collaborator too (I trust you all and don't intend to do in-depth code reviews of everything that is committed!)

Everything is now split into custom transformers (which I think will really help) and we can even use FME's Custom Transformer versioning if we wanted.

I haven't implemented any of the other ideas or submissions from here yet, but I will soon (or you can do that yourself now that you have access to the source repository)

Have fun ;-)

Badge

OK, the project is now in GitHub at: https://github.com/safesoftware/BestPractice

I made @sigtill and @redgeographics collaborators in the project. It's an open project so anyone else can also make changes and submit pull requests, but if you want direct access (without me having to approve your commits) then let me know and I will add you as a collaborator too (I trust you all and don't intend to do in-depth code reviews of everything that is committed!)

Everything is now split into custom transformers (which I think will really help) and we can even use FME's Custom Transformer versioning if we wanted.

I haven't implemented any of the other ideas or submissions from here yet, but I will soon (or you can do that yourself now that you have access to the source repository)

Have fun ;-)

Thanks for sharing.

 

A nice way of trying GIT and FME :)

 

 

Badge
@sigtill @redgeographics and others - I'm just wondering about the best way to set this up on GitHub. If there are commit conflicts then it's very hard to assess what the differences are using GitHub itself. You'd need to open them in FME and do a visual inspection to see any differences (writing a DIFF tool for FME workspaces - that sounds like another challenge!)

 

 

Anyway, I think one way to help minimize that - and to make the project more manageable - is to split off each of the sections into exported custom transformers. It becomes a little harder to 'install' into FME, but I think the other advantages outweigh that.

 

 

So I'm going to do that and see if I can get it into GitHub sometime today.

 

Regarding the FMW DIFF tool challenge, we are working on a solution. Given a workspace and a new version of the same workspace, we are generating a third workspace annotated with the differences.

 

 

Version 1:

 

 

Version 2 as been modified to add a Logger between the Creator and PythonCaller, modify the python code and change the number of features created from 1 to 2.

 

 

Raw differences workspace:

 

 

 

Differences workspace(after moving around the annotations who are superposing each others):

 

 

Annotations are in french because of the customer who requested this tool.

 

 

Green annotations, new stuff (two links and a Logger). Blue annotations, modified stuff (Creator and PyrhonCaller). Orange annotation, deleted stuff (one link).

 

 

Our actual challenge is to display python, SQL and other large text data differences in an external pre generated diff HTML file (like the differences in the python code).

 

 

We will present this tool at the FME World Tour in Montréal and Québec city.

 

Userlevel 4
Badge +25
Regarding the FMW DIFF tool challenge, we are working on a solution. Given a workspace and a new version of the same workspace, we are generating a third workspace annotated with the differences.

 

 

Version 1:

 

 

Version 2 as been modified to add a Logger between the Creator and PythonCaller, modify the python code and change the number of features created from 1 to 2.

 

 

Raw differences workspace:

 

 

 

Differences workspace(after moving around the annotations who are superposing each others):

 

 

Annotations are in french because of the customer who requested this tool.

 

 

Green annotations, new stuff (two links and a Logger). Blue annotations, modified stuff (Creator and PyrhonCaller). Orange annotation, deleted stuff (one link).

 

 

Our actual challenge is to display python, SQL and other large text data differences in an external pre generated diff HTML file (like the differences in the python code).

 

 

We will present this tool at the FME World Tour in Montréal and Québec city.

 

Fantastic. I like that. I wish I could be there to see the presentation.

 

Badge
Fantastic. I like that. I wish I could be there to see the presentation.

 

I'll be at FME UC if you want a demo.

 

 

Userlevel 4
Badge +25

Would be great if this was run right after you hit "Publish to FME Server" on the File menu of FME Workbench - so you could validate it BEFORE you publish it to an FME Server. Or more in generall - add an option to run a custom workspace before "Publish to FME Server" to validate something in that particular fmw workspace - for instance that all your database-connections are connected to staging/dev/prod and with the correct user / paths etc. When uploading to multiple FME Servers dev/stage/prod this can easily be forgotten!

I just filed PR#76897 - a request for a Notification service that is a Repository Watcher. It would be like a Directory Watcher but would issue a notification if a workspace is published to a particular repository.

 

 

That way you could run a testing workspace in response - but you also get the ability to be notified in general when a workspace is published or updated.

 

 

Validating before publishing to Server - well that would need us to incorporate this whole idea into Desktop functionality (and I just don't see that happening - at least not as easily as a Repository Watcher might).

 

Userlevel 4
Badge +25

Great workbench. I have been testing it on some of my recent work and noticed it would be helpful to add href links to FME help documentation.

In a few areas of the report this could be applied, such as WorkspaceProperties, Annotation, Bookmarks...

Workspace Properties

Annotations Adding

Bookmarks Using

analysisreport.zip

Great idea @austinh - it's now done. Check out the latest content on GitHub.

 

Userlevel 4
Badge +25

Very interesting project Mark, here's a little contribution by me: a table with the transformer histogram. bestpracticereportgenerator-hm.zip

I haven't included the contribution by @sigtill in mine, think it would be best if we come up with a smart way of handling more conributions, GitHub seems the best way to do that.

Some other ideas:

  • Include a list of links to the subsections either before or immediately after the header to facilitate navigation.
  • Add a section that checks for startup and shutdown scripts.
Thanks. I just added that part to the main workspace and committed it to GitHub

 

 

Userlevel 4
Badge +25

My contribution. Updated 1.Report Header to add more information regarding the file itself.

bestpracticereportgenerator-norkart-sigbjørn.zip

The attachment seems to have gone missing. Do you have a copy of it still?

 

 

Badge +22
Fantastic. I like that. I wish I could be there to see the presentation.

 

I saw this yesterday at the World Tour, it was very interesting but required a specific version control software to run.

 

 

Reply