Question

Text To Column using RegEx

7 years ago
August 4, 2017
12 replies
104 views

parashari
132 replies

Hi,

I am using FME 2017.0 x64 and the input data is PostGIS data from server

So, there is a column in the table 'other_tags that contains strings as displayed. I want the values in the left of => to be the attribute name and the value in the right of => to be the value of that attribute. The two columns are separated by comma(,).

The catch is the attribute name are not in a certain order, so I found a bit of difficulty to create list.

Moreover, the values in the columns can be multiple, like surface could be- paved, unpaved, asphalt, grass and many more and since the workbench has to be used for multiple countries the string also cannot be defined.

Could there be some easy possible way to extract the values that can make workbench shorter and easier.

I tried string searcher with regex- surface"=>".*"$ but something is going wrong.

Thanks.

david_r
8354 replies
7 years ago
August 4, 2017

You say "as displayed" but I can't see anything other than your question, did you maybe forget to post a screenshot?

A concrete example of the text you want to split up would be very helpful in trying to help.

parashari
Author
132 replies
7 years ago
August 4, 2017

david_r wrote:

You say "as displayed" but I can't see anything other than your question, did you maybe forget to post a screenshot?

A concrete example of the text you want to split up would be very helpful in trying to help.

Hi David, I have attached the screenshot now.

pallegama
Contributor
25 replies
7 years ago
August 4, 2017

Hi @hellblazer,

Could you please provide some sample input data which will be helpful for providing a solution.

If the single attribute contains multiple values that you need to separate into new attributes, first you can use an AttributeSplitter to build a list.

However, as a reply to your RegEx question, you can pass the value in 'other_tags' into two new attributes (in this example have created 'att_name' and 'att_value') and use two StringReplacers to extract part before => as 'att_name' and part after => as 'att_value'. (Please see the attached workspace)

To extract value for 'att_name' you can use RegEx ^.*>

To extract value for 'att_value' RegEx is =>.*$

Then you can build the new attribute using 'att_name' and 'att_value'.

Note: FME is not directly supporting creating new attribute using a value of another attribute to use them within the workflow. However, you can create attribute in this manner and write them into final output using an SchemaSetter and setting the writer to Dynamic with "Schema From Schema Feature" as depicted in the attached sample workspace.

regex-sample.fmw

Regards,

Priyantha Pallegama

david_r
8354 replies
7 years ago
August 4, 2017

You can of course accomplish the same thing using transformers (e.g. AttributeSplitters etc), but here's a PythonCaller based solution that will create an attribute for each tag listed in "other_tags":

def SplitTags(feature):
    tags_str = feature.getAttribute('other_tags')
    if tags_str:
        # Split tags into key/value pairs in a dict
        tags = dict([x.split('=>') for x in tags_str.split(',')])
        # Trim superfluous spaces
        tags = {k.strip(): v.strip() for k, v in tags.iteritems()}
        # Trim double quotes
        tags = {k.strip('"'): v.strip('"') for k, v in tags.iteritems()}
        for k, v in tags.iteritems():
            feature.setAttribute(k, v)

Example input:

"int_ref" => "E 691", "name:ka" => "ka", "name:ru" => "ru", "surface" => "asphalt"

Example output:

`int_ref' has value `E 691'
`name:ka' has value `ka'
`name:ru' has value `ru'
`surface' has value `asphalt'

Note that there might be some edge cases where you'll get unexpected results, notably if the string "=>" or a comma occurs inside a tag value rather than as a separator. If that's the case, a more advanced parsing mechanism is needed.

david_r
8354 replies
7 years ago
August 4, 2017

If you want a purely regex-based solution, you can use this in a StringSearcher:

"surface"\s*=>\s*"(.+?)"

If you set the "subexpression matches list name" to "_subs", you'll find the surface value in the attribute _subs{0}.part

parashari
Author
132 replies
7 years ago
August 4, 2017

pallegama wrote:

Hi @hellblazer,

Could you please provide some sample input data which will be helpful for providing a solution.

If the single attribute contains multiple values that you need to separate into new attributes, first you can use an AttributeSplitter to build a list.

To extract value for 'att_name' you can use RegEx ^.*>

To extract value for 'att_value' RegEx is =>.*$

Then you can build the new attribute using 'att_name' and 'att_value'.

regex-sample.fmw

Regards,

Priyantha Pallegama

Hi @pallegama

Thanks for your efforts.I have attached the sampledata.zip. This will give you a more clear picture to you about my requirement.

takashi
7703 replies
7 years ago
August 4, 2017

Hi @hellblazer, alternatively, the JSONFlattener with this expression set to the JSON Document parameter can be used here.

{@ReplaceString(@Value(other_tags),=>,:)}

You can then expose required attribute names with the AttributeExposer, if necessary.

See also the attachment: jsonflattener-demo.fmwt (FME 2017.0.1.1)

takashi
7703 replies
7 years ago
August 4, 2017

takashi wrote:

Hi @hellblazer, alternatively, the JSONFlattener with this expression set to the JSON Document parameter can be used here.

{@ReplaceString(@Value(other_tags),=>,:)}

You can then expose required attribute names with the AttributeExposer, if necessary.

See also the attachment: jsonflattener-demo.fmwt (FME 2017.0.1.1)

I forgot the JSONFlattener has the "Attributes to Expose" parameter. The AttributeExposer is not essential to expose attribute names.

+15

gio
Contributor
2252 replies
7 years ago
August 4, 2017

@hellblazer

If you try to use .* (any wordcarachter) for searching thenyou are also searching for comma's etc.

Supposing the strings only contain alphanumerical characters, spaces,questionmarks and semicolons then you can use following regexp in a (for instance) stringsearcher

"([a-zA-Z:\\s\\d\\?]+)"=>"([a-zA-Z:\\s\\d\\?]+)"

+15

gio
Contributor
2252 replies
7 years ago
August 4, 2017

gio wrote:

@hellblazer

If you try to use .* (any wordcarachter) for searching thenyou are also searching for comma's etc.

Supposing the strings only contain alphanumerical characters, spaces,questionmarks and semicolons then you can use following regexp in a (for instance) stringsearcher

"([a-zA-Z:\\s\\d\\?]+)"=>"([a-zA-Z:\\s\\d\\?]+)"

...and of course set All Matches Listname and subexpression etc. on the stringsearcher (if you use that one)

parashari
Author
132 replies
7 years ago
August 7, 2017

gio wrote:

@hellblazer

If you try to use .* (any wordcarachter) for searching thenyou are also searching for comma's etc.

Supposing the strings only contain alphanumerical characters, spaces,questionmarks and semicolons then you can use following regexp in a (for instance) stringsearcher

"([a-zA-Z:\\s\\d\\?]+)"=>"([a-zA-Z:\\s\\d\\?]+)"

Hi @gio

I tried using your idea of the above reg ex, and it definitely works with the first element. Although, if I look at the overall number of elements like there are surface, lanes, name eng, rus, etc., I need to create a list for each. Also, the elements are not in order in each row, will that be an issue?

+15

gio
Contributor
2252 replies
7 years ago
August 7, 2017

parashari wrote:

Hi @gio

@hellblazer

You can use the string searcher and set All Matces and sub part list name.

All the matches will then be in the resulting list.

Order is no issue, all that matters is the pattern of the input data.

The Items in the character class (everything between the brackets) should cover the required extent.

Here is a pic

(I used David's test string, so I added optional spaces between the items, whitch your input has not and underscores to the regexp)

Sub match part 0 is attr_name and part 1 is it's value.

Reply

Rich Text Editor, editor1

Cookie policy

We use cookies to enhance and personalize your experience. If you accept you agree to our full cookie policy. Learn more about our cookies.

Cookie settings

We use 3 different kinds of cookies. You can choose which cookies you want to accept. We need basic cookies to make this site work, therefore these are the minimum you can select. Learn more about our cookies.

Basic
Functional

Normal
Functional + analytics

Complete
Functional + analytics + social media + embedded videos + marketing

Text To Column using RegEx

12 replies

Reply

Helpful Members This Week

Recently Solved Questions

Remove last CR/LF from a CSV

1019 error with change detector and polygons

Where is the "Show Bookmark Navigator" option in FME 2024.2?

How to dynamically write new or update existing ArcGIS Online Feature Layers.

Extracting m-values from ESRI linear-referenced line feature class

Community Stats

Latest FME

Cookie policy

Cookie settings

Reply

Related Topics

Geometry Template, problem getting list attributes when using variable for indicesicon

XMLTemplater GUI won't recognize Geometry Templateicon

Question of the Week: Writing Fixed Length Numbers to Excelicon

|ERROR |Error executing SQL command ('FETCH 1000 IN "centraal_v_streng_export_flat_crsr"'): 'Error connecting to database' |A fatal error has occurred. Check the logfile above for detaicon

Failed to create unique list Pythonicon

Helpful Members This Week

Recently Solved Questions

Remove last CR/LF from a CSV

1019 error with change detector and polygons

Where is the "Show Bookmark Navigator" option in FME 2024.2?

How to dynamically write new or update existing ArcGIS Online Feature Layers.

Extracting m-values from ESRI linear-referenced line feature class

Popular Tags

Community Stats

Latest FME

Sign up

An FME Account is required to contribute

Login to the community

An FME Account is required to contribute

Scanning file for viruses.

This file cannot be downloaded

Cookie policy

Cookie settings