You say "as displayed" but I can't see anything other than your question, did you maybe forget to post a screenshot?
A concrete example of the text you want to split up would be very helpful in trying to help.
You say "as displayed" but I can't see anything other than your question, did you maybe forget to post a screenshot?
A concrete example of the text you want to split up would be very helpful in trying to help.
Hi David, I have attached the screenshot now.
Â
Â
Hi @hellblazer,
Could you please provide some sample input data which will be helpful for providing a solution.
If the single attribute contains multiple values that you need to separate into new attributes, first you can use an AttributeSplitter to build a list.
However, as a reply to your RegEx question, you can pass the value in 'other_tags' into two new attributes (in this example have created 'att_name' and 'att_value') and use two StringReplacers to extract part before => as 'att_name' and part after => as 'att_value'. (Please see the attached workspace)
To extract value for 'att_name' you can use RegEx ^.*>
To extract value for 'att_value' RegEx is =>.*$
Then you can build the new attribute using 'att_name' and 'att_value'.
Note: FME is not directly supporting creating new attribute using a value of another attribute to use them within the workflow. However, you can create attribute in this manner and write them into final output using an SchemaSetter and setting the writer to Dynamic with "Schema From Schema Feature" as depicted in the attached sample workspace.
regex-sample.fmw
Â
Regards,
Priyantha Pallegama
You can of course accomplish the same thing using transformers (e.g. AttributeSplitters etc), but here's a PythonCaller based solution that will create an attribute for each tag listed in "other_tags":
def SplitTags(feature):
    tags_str = feature.getAttribute('other_tags')
    if tags_str:
        # Split tags into key/value pairs in a dict
        tags = dict(Âx.split('=>') for x in tags_str.split(',')])
        # Trim superfluous spaces
        tags = {k.strip(): v.strip() for k, v in tags.iteritems()}
        # Trim double quotes
        tags = {k.strip('"'): v.strip('"') for k, v in tags.iteritems()}
        for k, v in tags.iteritems():
            feature.setAttribute(k, v)
Example input:
"int_ref" => "E 691", "name:ka" => "ka", "name:ru" => "ru", "surface" => "asphalt"
Example output:
`int_ref' has value `E 691'
`name:ka' has value `ka'
`name:ru' has value `ru'
`surface' has value `asphalt'
Note that there might be some edge cases where you'll get unexpected results, notably if the string "=>" or a comma occurs inside a tag value rather than as a separator. If that's the case, a more advanced parsing mechanism is needed.
If you want a purely regex-based solution, you can use this in a StringSearcher:
"surface"\s*=>\s*"(.+?)"
If you set the "subexpression matches list name" to "_subs", you'll find the surface value in the attribute _subs{0}.part
Hi @hellblazer,
Could you please provide some sample input data which will be helpful for providing a solution.
If the single attribute contains multiple values that you need to separate into new attributes, first you can use an AttributeSplitter to build a list.
However, as a reply to your RegEx question, you can pass the value in 'other_tags' into two new attributes (in this example have created 'att_name' and 'att_value') and use two StringReplacers to extract part before => as 'att_name' and part after => as 'att_value'. (Please see the attached workspace)
To extract value for 'att_name' you can use RegEx ^.*>
To extract value for 'att_value' RegEx is =>.*$
Then you can build the new attribute using 'att_name' and 'att_value'.
Note: FME is not directly supporting creating new attribute using a value of another attribute to use them within the workflow. However, you can create attribute in this manner and write them into final output using an SchemaSetter and setting the writer to Dynamic with "Schema From Schema Feature" as depicted in the attached sample workspace.
regex-sample.fmw
Â
Regards,
Priyantha Pallegama
Hi @pallegama
Â
Â
Thanks for your efforts.I have attached the
sampledata.zip. This will give you a more clear picture to you about my requirement.
Â
Hi @hellblazer, alternatively, the JSONFlattener with this expression set to the JSON Document parameter can be used here.
{@ReplaceString(@Value(other_tags),=>,:)}
You can then expose required attribute names with the AttributeExposer, if necessary.
See also the attachment:  jsonflattener-demo.fmwt (FME 2017.0.1.1)
Hi @hellblazer, alternatively, the JSONFlattener with this expression set to the JSON Document parameter can be used here.
{@ReplaceString(@Value(other_tags),=>,:)}
You can then expose required attribute names with the AttributeExposer, if necessary.
See also the attachment:  jsonflattener-demo.fmwt (FME 2017.0.1.1)
I forgot the JSONFlattener has the "Attributes to Expose" parameter. The AttributeExposer is not essential to expose attribute names.
Â
Â
@hellblazer
If you try to use .* (any wordcarachter) for searching thenyou are also searching for comma's etc.
Supposing the strings only contain alphanumerical characters, spaces,questionmarks and semicolons then you can use following regexp in a (for instance) stringsearcher
"([a-zA-Z:\\s\\d\\?]+)"=>"([a-zA-Z:\\s\\d\\?]+)"
@hellblazer
If you try to use .* (any wordcarachter) for searching thenyou are also searching for comma's etc.
Supposing the strings only contain alphanumerical characters, spaces,questionmarks and semicolons then you can use following regexp in a (for instance) stringsearcher
"([a-zA-Z:\\s\\d\\?]+)"=>"([a-zA-Z:\\s\\d\\?]+)"
Â
...and of course set All Matches Listname and subexpression etc. on the stringsearcher (if you use that one)
Â
Â
@hellblazer
If you try to use .* (any wordcarachter) for searching thenyou are also searching for comma's etc.
Supposing the strings only contain alphanumerical characters, spaces,questionmarks and semicolons then you can use following regexp in a (for instance) stringsearcher
"([a-zA-Z:\\s\\d\\?]+)"=>"([a-zA-Z:\\s\\d\\?]+)"
Hi @gio
Â
I tried using your idea of the above reg ex, and it definitely works with the first element. Although, if I look at the overall number of elements like there are surface, lanes, name eng, rus, etc., I need to create a list for each. Also, the elements are not in order in each row, will that be an issue?
Â
Hi @gio
Â
I tried using your idea of the above reg ex, and it definitely works with the first element. Although, if I look at the overall number of elements like there are surface, lanes, name eng, rus, etc., I need to create a list for each. Also, the elements are not in order in each row, will that be an issue?
Â
@hellblazer
Â
Â
You can use the string searcher and set All Matces and sub part list name.
Â
All the matches will then be in the resulting list.
Â
Order is no issue, all that matters is the pattern of the input data.
Â
The Items in the character class (everything between the brackets) should cover the required extent.
Â
Here is a pic
Â
(I used David's test string, so I added optional spaces between the items, whitch your input has not and underscores to the regexp)
Â
Â
Sub match part 0 is attr_name and part 1 is it's value.
Â