Skip to main content

I am trying to join a SharePoint table with no geometry to a set of polygons (about 2000 rows). The table has unique project titles for for each polygon that I want to use as a join field; but the polygons themselves are stored as fgdb files deeper within the SharePoint file structure and they are a non-standardized legacy dataset that I am trying to aggregate (hence why I am transferring them all over to SharePoint as the first step). 

 

When writing the fgdb’s I am also exposing the file path as the fme dataset field, so the unique project id’s in the table are a part of that file path.

 

I am trying to use the string replacer to replace all the characters leading up to, and after, the project title with nothing so that only the project title is exposed. Sadly I can’t just count the characters as there are different folders within the file path with different names and character lengths so now I am trying to come up with a more complicated regex expression that will trim the start and then end of the file path strings while leaving just the project id. 

 

The file path looks something like this:

 

C:\Users\Windows\Organization Title\Program Title\Regional Boundary\Qualifier\Community Name\Project ID\spatial_data folder\file geodatabase name.gdb.zip

 

Does anyone have any good advice for achieving this?

You say that there are different folders within the file path with different names, hence varying character length. But does the overall folder structure change? So, other than the folder and file names changing is the project ID folder always the 2nd to last folder before the file? Regex are certainly not my strong suit, but I have some ideas depending on the folder structuring.


That’s right, the folder structure is always the same, with the project id being the 2nd to last before the file itself. Regex is also currently a weak point for me, but I can see a solution using the backslashes as a marker (since there will always be the same number of them before and after the project id) but i can’t quite get there.


Well, my immediate thought would then be a solution without regex. You could use an AttributeSplitter with the backslash as delimiter. Then you could use the ListIndexer with the index set to -3. I am sure there is also a solution involving regex, I will see what I can think of in case my initial suggestion does not work.


For the Regex solution, this is what I have come up

(?<=\\)([a-z0-9]| )+(?=\\([a-z0-9]| |_)+\\[^\\]+$)

Starting with a positive lookbehind for a backspace, then there is the folder name for Project ID which I currently have set to ((a-z0-9]| )+ but you could change this to best fit your needs, last there is a positive lookahead which matches to the last folder plus the filename(any characters that are not backslashes) and the end of line.

I think the AttributeSplitter and ListIndexer would be the better solution, and it would probably be easier to maintain in the future with potential changes. There is also likely a more elegant regex than what I have come up with, but hopefully that helps.


The attribute splitter worked like a charm. Thanks for the assist!


Reply