Hi, A bit random messaging a year later, but I was wondering if you got any further on scanning FME Data Lineage, other than via manual population.
I have the same use case requirement and struggling to find any answers beyond manual.
Did you end up building a manual population process?
Hello! Also hoping anyone has anything on this. I'm working on a solution that reads FMW files to determine data sources and create lineage to send to the Apache Atlas API for our data catalog. However this requires custom development, and all organizational FMW's to be stored in an accessible place. If there is anything from SAFE on a way to catalog all workbenches that would be a huge help.
I never made any progress on this and it's fallen to the wayside as a priority. Never heard anything from Safe, but I haven't specifically asked our reps yet. Haven't heard that there is a lineage/catalog tool that comes preconfigured to scan something like an FME server and harvest lineage info.
Interesting that you're reading FMW files. For us one issue is we have some workspaces in our FME server that are dynamic. For those the different scheduled runs of the same workspace will move different datasets. So at 8:01 am, Workspace 1 reads dataset A and writes to dataset B. Then at 8:02am, Workspace 1 reads dataset C and writes to dataset D. Therefore the nuance of sources and targets is in FME server's scheduling system.
I've raised a support ticket with Safe to see if this is something they've looked into before, couldn't find anything in other documentation on this portal or online in general, I'll let you know when they respond!
The dynamic workspace problem isn't something I considered, and I wonder (fear) that's something I'll come across here as well. My first thought is to create a log file from within the workspace, so whenever it runs it records the start time, each dataset it reads from, then each dataset it writes to, and the end time. Then creating a script that reads that log file each day to see what datasets are connected where, and enter that information into the data catalog via API. Would require a catalog that can have manual entries, which is something that Apache Atlas and Microsoft Purview can do from my research, but not sure how that would fit into any other data catalog system.
Hi, Great to see this issue is coming up a bit more in the community.
I have had a chat with Safe Technical staff while running a trial of FME Form. My own finding was that the Workspace Reader is the most likely candidate for extracting the structure of Workspaces etc, essentially so you can build a CSV/JSON file of the structure you need to represent in our Data Catalog product.
Safe Technical support stated that they also felt the Workspace Reader is probably the best fit for our needs, rather then the Flow REST APIs.
As mentioned by others, there are not many (if any?) applications that can scan FME. so we are currently faced with developing a custom solution. In the case of Informatica Cloud Data Governance and Catalog (CDGC), there is mechanism to build a custom scanner. Essentially, this means you need to define the metadata model representing the FME Strucuture (Workspace, Reader, Writer, Feature, Feature Types etc), develop another process in FME that can extract the Workspace components, then map that to the data model you built earlier. This can then be built as a scheduled process that re-populates the data model on a regular basis.
Hi, Great to see this issue is coming up a bit more in the community.
I have had a chat with Safe Technical staff while running a trial of FME Form. My own finding was that the Workspace Reader is the most likely candidate for extracting the structure of Workspaces etc, essentially so you can build a CSV/JSON file of the structure you need to represent in our Data Catalog product.
Safe Technical support stated that they also felt the Workspace Reader is probably the best fit for our needs, rather then the Flow REST APIs.
As mentioned by others, there are not many (if any?) applications that can scan FME. so we are currently faced with developing a custom solution. In the case of Informatica Cloud Data Governance and Catalog (CDGC), there is mechanism to build a custom scanner. Essentially, this means you need to define the metadata model representing the FME Strucuture (Workspace, Reader, Writer, Feature, Feature Types etc), develop another process in FME that can extract the Workspace components, then map that to the data model you built earlier. This can then be built as a scheduled process that re-populates the data model on a regular basis.
Thats good to know. Wasn't aware of that reader. Seems like it just reads .fmw files though and doesn't have an option to read workspaces from an FME Server endpoint. The ones our analysts have loaded into server are the only ones I'd consider enterprise/production and would want in a data catalog. Maybe there's a way to scan the backend of the server since I'm assuming the .fmw files behind the services are there. And then the FME server scheduler would be where the metadata for the data-sync frequency would be stored.
Here is an idea:
OpenLineage is a standard for capturing and sharing information about data workflows—like where data comes from, the transformations it undergoes, and where it ends up. With this integration, FME could easily send lineage details to popular data catalog tools like Microsoft Purview, Collibra, or Alation, making it a go-to choice for organizations that care about data governance and tracking.
This would also make FME more future-proof, helping businesses comply with regulations like GDPR and CCPA while giving them better insight into their data pipelines. It’s a simple way to make FME even more powerful and appealing to teams looking for complete, end-to-end data solutions.
Just saying.
Att - Alexander from San Antonio River Authority.