Skip to main content
I have assembled table in SpatiaLite holding 30 million points of depth sounding observations originating from a number of vessels, each with a large number of files.

 

To keep some metadata about where each point originates from, I've kept the file path and name in an attribute called "filepath".

 

From the attribute "filepath", I need to extract a specific part which holds the name of the vessel. I guess it would be some regex in use here?

 

 

Example of the attribute's content:

 

\\folder\\VESSEL_Lola\\folder2\\file1.shp

 

\\folder\\VESSEL_Maria\\folder1\\file2.shp

 

\\folder\\VESSEL_Lily\\folder4\\file3.shp

 

\\folder\\VESSEL_Christine\\folder1\\file4.shp

 

\\folder\\VESSEL_ClaudiaMaria\\folder2\\file5.shp

 

\\folder\\VESSEL_Maria\\folder3\\file6.shp

 

and so on..

 

 

I need to extract the "VESSEL_Maria" etc. from the attribute and map to a more explanatory value in e.g. AttributeValueMapper. There is only 12 different "VESSEL_YY" categories, but a lot of different filenames of subfolders and files written in the filepath attribute.

 

How should I construct the Source Value parameter in AttributeValueMapper or similar?
Hi

 

 

you can use a StringSearcher like this:

 

 

 

 

You will then get the following new attributes, e.g.:

 

 

`_matched_characters' has value `VESSEL_Maria'

 

`_matched_parts{0}' has value `Maria'

 

 

David

 

 

 
Hi David,

 

Thank you, works fine.

 

Regex is very useful but a bit hard to grasp and construct for me.

 

More on regex in the docs for the next readers:

 

http://docs.safe.com/fme/html/FME_Workbench/FME_Workbench.htm#Workbench/Regular_Expressions.htm

 

 

So far I understand it:

 

() - matches an empty string - look for something in a text.

 

\\w - looks for alphanumeric characters (letters and numbers)

 

* - Indicates zero or more characters.

 

 

How the backslashes in my strings are avoided, I am not sure, really.
Hi

 

 

Yes, regular expressions can be incredibly powerful, but they're anything but user friendly :-)

 

 

We avoid the backslashes since they're not included in the alphanumeric character class "\\w".

 

 

The paranthesis (...) is a numbered grouping operator, it matches whatever is inside it, so that you can reference it later using the "_matched_parts{}" list attribute.

 

 

If you want to learn more about regular expressions, I can recommend this tutorial: http://www.regular-expressions.info/tutorial.html

 

 

David

Reply