Skip to main content
Question

RegEx help needed. Separate one attribute into four attributes

  • March 5, 2020
  • 5 replies
  • 8 views

I'm new with FME. I have one attribute with this value "Zonne-energie in woonplaats vermogen 0,2394 MW, beschikte productie per jaar 227,43 MWh, looptijd 15 jaar. Het project is nog niet gerealiseerd (peildatum januari 2020)."

I want 4 values out of this values and put them into 4 attributes.

- 0,2394 MW into VERMOGEN

- 227,43 into PRODUCTIE

- 15 into LOOPTIJD

- januari 2020 into PEILDATUM

I want to use StringSearcher but i don't what Regular Expression to use. C

Who can help me?

This post is closed to further activity.
It may be an old question, an answered question, an implemented idea, or a notification-only post.
Please check post dates before relying on any information in a question or answer.
For follow-up or related questions, please post a new question or idea.
If there is a genuine update to be made, please contact us and request that the post is reopened.

5 replies

ebygomm
Influencer
Forum|alt.badge.img+44
  • Influencer
  • 3427 replies
  • March 5, 2020

It would probably be helpful to provide some additional examples of your attribute values to enable queries to be formulated


erik_jan
Contributor
Forum|alt.badge.img+22
  • Contributor
  • 2179 replies
  • March 5, 2020

I created a workspace for the 4 regular expressions.

Hope it helps.

regex.fmw


david_r
Celebrity
  • 8392 replies
  • March 5, 2020

Making lots of assumptions here, but try the following four StringSearcher:

VERMOGEN

vermogen (\d*\,?\d+ \w+)

PRODUCTIE

productie per jaar ([[0-9]*[,]]?[0-9]+)

LOOPTIJD

looptijd (\d+)

PEILDATUM

peildatum (\w+ \d+)

All assume that the StringSearcher has been configured with a subexpression list name e.g. "_subs", the result will then be in "_subs{0}.part"


ebygomm
Influencer
Forum|alt.badge.img+44
  • Influencer
  • 3427 replies
  • March 5, 2020

But for example:

VERMOGEN

a number followed by MW but not h

[0-9]*,*[0-9]* MW(?!h)

PRODUCTIE

a number followed by MWh but only return the number

[0-9]*,*[0-9]*(?=\sMWh)

LOOPTIJD

A numerical value followed by jaar

[0-9]*(?=\sjaar)

PEILDATUM 

Some letters and 4 numbers always preceded by peildatum

(?<=peildatum\s)[A-Za-z]+\s[0-9]{4}

  • Author
  • 1 reply
  • March 9, 2020

Thank you all. It works!