Hi,I am using FME 2017.0 x64 and the input data is PostGIS data from serverSo, there is a column in the table 'other_tags that contains strings as displayed. I want the values in the left of => to be the attribute name and the value in the right of => to be the value of that attribute. The two columns are separated by comma(,).The catch is the attribute name are not in a certain order, so I found a bit of difficulty to create list.Moreover, the values in the columns can be multiple, like surface could be- paved, unpaved, asphalt, grass and many more and since the workbench has to be used for multiple countries the string also cannot be defined.Could there be some easy possible way to extract the values that can make workbench shorter and easier.I tried string searcher with regex- surface"=>".*"$ but something is going wrong.Thanks.

Text To Column using RegEx

Userlevel 4

You say "as displayed" but I can't see anything other than your question, did you maybe forget to post a screenshot?

A concrete example of the text you want to split up would be very helpful in trying to help.

+1

parashari
Author
132 replies
6 years ago
4 August 2017

You say "as displayed" but I can't see anything other than your question, did you maybe forget to post a screenshot?

A concrete example of the text you want to split up would be very helpful in trying to help.

Hi David, I have attached the screenshot now.

Hi @hellblazer,

Could you please provide some sample input data which will be helpful for providing a solution.

If the single attribute contains multiple values that you need to separate into new attributes, first you can use an AttributeSplitter to build a list.

However, as a reply to your RegEx question, you can pass the value in 'other_tags' into two new attributes (in this example have created 'att_name' and 'att_value') and use two StringReplacers to extract part before => as 'att_name' and part after => as 'att_value'. (Please see the attached workspace)

To extract value for 'att_name' you can use RegEx ^.*>

To extract value for 'att_value' RegEx is =>.*$

Then you can build the new attribute using 'att_name' and 'att_value'.

Note: FME is not directly supporting creating new attribute using a value of another attribute to use them within the workflow. However, you can create attribute in this manner and write them into final output using an SchemaSetter and setting the writer to Dynamic with "Schema From Schema Feature" as depicted in the attached sample workspace.

regex-sample.fmw

Regards,

Priyantha Pallegama

Userlevel 4

You can of course accomplish the same thing using transformers (e.g. AttributeSplitters etc), but here's a PythonCaller based solution that will create an attribute for each tag listed in "other_tags":

def SplitTags(feature):
    tags_str = feature.getAttribute('other_tags')
    if tags_str:
        # Split tags into key/value pairs in a dict
        tags = dict([x.split('=>') for x in tags_str.split(',')])
        # Trim superfluous spaces
        tags = {k.strip(): v.strip() for k, v in tags.iteritems()}
        # Trim double quotes
        tags = {k.strip('"'): v.strip('"') for k, v in tags.iteritems()}
        for k, v in tags.iteritems():
            feature.setAttribute(k, v)

Example input:

"int_ref" => "E 691", "name:ka" => "ka", "name:ru" => "ru", "surface" => "asphalt"

Example output:

`int_ref' has value `E 691'
`name:ka' has value `ka'
`name:ru' has value `ru'
`surface' has value `asphalt'

Note that there might be some edge cases where you'll get unexpected results, notably if the string "=>" or a comma occurs inside a tag value rather than as a separator. If that's the case, a more advanced parsing mechanism is needed.

Userlevel 4

If you want a purely regex-based solution, you can use this in a StringSearcher:

"surface"\s*=>\s*"(.+?)"

If you set the "subexpression matches list name" to "_subs", you'll find the surface value in the attribute _subs{0}.part

+1

parashari
Author
132 replies
6 years ago
4 August 2017

Hi @hellblazer,

Could you please provide some sample input data which will be helpful for providing a solution.

If the single attribute contains multiple values that you need to separate into new attributes, first you can use an AttributeSplitter to build a list.

However, as a reply to your RegEx question, you can pass the value in 'other_tags' into two new attributes (in this example have created 'att_name' and 'att_value') and use two StringReplacers to extract part before => as 'att_name' and part after => as 'att_value'. (Please see the attached workspace)

To extract value for 'att_name' you can use RegEx ^.*>

To extract value for 'att_value' RegEx is =>.*$

Then you can build the new attribute using 'att_name' and 'att_value'.

Note: FME is not directly supporting creating new attribute using a value of another attribute to use them within the workflow. However, you can create attribute in this manner and write them into final output using an SchemaSetter and setting the writer to Dynamic with "Schema From Schema Feature" as depicted in the attached sample workspace.

regex-sample.fmw

Regards,

Priyantha Pallegama

Hi @pallegama

Thanks for your efforts.I have attached the sampledata.zip. This will give you a more clear picture to you about my requirement.

Userlevel 2

+17

takashi
Contributor
7538 replies
6 years ago
4 August 2017

Hi @hellblazer, alternatively, the JSONFlattener with this expression set to the JSON Document parameter can be used here.

{@ReplaceString(@Value(other_tags),=>,:)}

You can then expose required attribute names with the AttributeExposer, if necessary.

See also the attachment: jsonflattener-demo.fmwt (FME 2017.0.1.1)

Userlevel 2

+17

takashi
Contributor
7538 replies
6 years ago
4 August 2017

Hi @hellblazer, alternatively, the JSONFlattener with this expression set to the JSON Document parameter can be used here.

{@ReplaceString(@Value(other_tags),=>,:)}

You can then expose required attribute names with the AttributeExposer, if necessary.

See also the attachment: jsonflattener-demo.fmwt (FME 2017.0.1.1)

I forgot the JSONFlattener has the "Attributes to Expose" parameter. The AttributeExposer is not essential to expose attribute names.

G

+3

@hellblazer

If you try to use .* (any wordcarachter) for searching thenyou are also searching for comma's etc.

Supposing the strings only contain alphanumerical characters, spaces,questionmarks and semicolons then you can use following regexp in a (for instance) stringsearcher

"([a-zA-Z:\\s\\d\\?]+)"=>"([a-zA-Z:\\s\\d\\?]+)"

G

+3

@hellblazer

If you try to use .* (any wordcarachter) for searching thenyou are also searching for comma's etc.

Supposing the strings only contain alphanumerical characters, spaces,questionmarks and semicolons then you can use following regexp in a (for instance) stringsearcher

"([a-zA-Z:\\s\\d\\?]+)"=>"([a-zA-Z:\\s\\d\\?]+)"

...and of course set All Matches Listname and subexpression etc. on the stringsearcher (if you use that one)

+1

parashari
Author
132 replies
6 years ago
7 August 2017

@hellblazer

If you try to use .* (any wordcarachter) for searching thenyou are also searching for comma's etc.

Supposing the strings only contain alphanumerical characters, spaces,questionmarks and semicolons then you can use following regexp in a (for instance) stringsearcher

"([a-zA-Z:\\s\\d\\?]+)"=>"([a-zA-Z:\\s\\d\\?]+)"

Hi @gio

I tried using your idea of the above reg ex, and it definitely works with the first element. Although, if I look at the overall number of elements like there are surface, lanes, name eng, rus, etc., I need to create a list for each. Also, the elements are not in order in each row, will that be an issue?

G

+3

Hi @gio

I tried using your idea of the above reg ex, and it definitely works with the first element. Although, if I look at the overall number of elements like there are surface, lanes, name eng, rus, etc., I need to create a list for each. Also, the elements are not in order in each row, will that be an issue?

@hellblazer

You can use the string searcher and set All Matces and sub part list name.

All the matches will then be in the resulting list.

Order is no issue, all that matters is the pattern of the input data.

The Items in the character class (everything between the brackets) should cover the required extent.

Here is a pic

(I used David's test string, so I added optional spaces between the items, whitch your input has not and underscores to the regexp)

Sub match part 0 is attr_name and part 1 is it's value.

Text To Column using RegEx

12 replies

Reply

Community Stats

Reply

Community Stats

Sign up

An FME Account is required to contribute

Login to the community

An FME Account is required to contribute

Scanning file for viruses.

This file cannot be downloaded