Skip to main content

I have a column given to me in a spreadsheet which looks like that:

What I need is to get all the references out, the ones in square brackets, to provide with the full list to a user:

An then get a full list of all references, as follows:

Does anyone have an idea of how I can do this using FME?

referencesfinder.fmwtGiven an input attribute "text", this sequence of transformers will give you one feature per reference with the attribute "Reference_item" containing a reference number.

You could then consider using a DuplicateFilter (DuplicateRemover before FME 2016) and/or a Sorter before proceeding.

Sample workspace is attached.


A stringsearcher with regex such as [0-9]{4} (this assumes they are always 4 digits) to build a list of matched characters, then a list exploder to create the separate features

 

 

 


A stringsearcher with regex such as [0-9]{4} (this assumes they are always 4 digits) to build a list of matched characters, then a list exploder to create the separate features

 

 

 

But you'd also end up with numbers outside of the brackets, if there are any...

 

 


But you'd also end up with numbers outside of the brackets, if there are any...

 

 

Yes, does depend on how sure of the data format you are

 

 


@darius

Connect the spreadsheet to a SQL-Executor. (i used a texfile here)

Then Create following script.

It creates attributes from all hits of a regular expression.( By assigning them levels)

The purple bits you can edit as you wish/need.

A shame the InlineQuerier does nto recognise regexp_substr etc.. Then yu would not need a database connection for this. AS we simple use dual here.


into attributes.. (i created random text and random typed numbers after the first couple b.t.w.)

@darius


@darius

Connect the spreadsheet to a SQL-Executor. (i used a texfile here)

Then Create following script.

It creates attributes from all hits of a regular expression.( By assigning them levels)

The purple bits you can edit as you wish/need.

A shame the InlineQuerier does nto recognise regexp_substr etc.. Then yu would not need a database connection for this. AS we simple use dual here.

That's a cool solution. Just be aware, as for the other one, that this will also return any numbers occurring outside of the bracketed list.

referencesfinder.fmwtGiven an input attribute "text", this sequence of transformers will give you one feature per reference with the attribute "Reference_item" containing a reference number.

You could then consider using a DuplicateFilter (DuplicateRemover before FME 2016) and/or a Sorter before proceeding.

Sample workspace is attached.

Thanks for this doing this so throughly. It worked really well!

 


A stringsearcher with regex such as [0-9]{4} (this assumes they are always 4 digits) to build a list of matched characters, then a list exploder to create the separate features

 

 

 

Not sure if there are numbers outside the square brackets. The spreadsheet is quite large, so numbers that are not references will be extracted. References are always in square brackets. However, I will use the regex expression from @david_r, which works very well.

 


@darius

Connect the spreadsheet to a SQL-Executor. (i used a texfile here)

Then Create following script.

It creates attributes from all hits of a regular expression.( By assigning them levels)

The purple bits you can edit as you wish/need.

A shame the InlineQuerier does nto recognise regexp_substr etc.. Then yu would not need a database connection for this. AS we simple use dual here.

Thanks gio. As @david_r said, this is a clever solution. However, I am not sure if numbers ocurr outside the square brackets.

 


@darius

Of course, it would. This is just to show the technique.

It is not to show how to create regular expressions.

And example text posted does not show any numbers outside brackets.

Anywaýs':

(?==;\\\]*)\\b\\d+\\b(?==;\\]])

all numbers between brackets and or semicolons.


@darius

Of course, it would. This is just to show the technique.

It is not to show how to create regular expressions.

And example text posted does not show any numbers outside brackets.

Anywaýs':

(?==;\\\]*)\\b\\d+\\b(?==;\\]])

all numbers between brackets and or semicolons.

 

This regex would also work within the stringsearcher

Reply