Skip to main content
Question

Getting numbers out of cells with text and producing a list with all these numbers


Forum|alt.badge.img

I have a column given to me in a spreadsheet which looks like that:

What I need is to get all the references out, the ones in square brackets, to provide with the full list to a user:

An then get a full list of all references, as follows:

Does anyone have an idea of how I can do this using FME?

12 replies

david_r
Celebrity
  • October 25, 2016

referencesfinder.fmwtGiven an input attribute "text", this sequence of transformers will give you one feature per reference with the attribute "Reference_item" containing a reference number.

You could then consider using a DuplicateFilter (DuplicateRemover before FME 2016) and/or a Sorter before proceeding.

Sample workspace is attached.


ebygomm
Influencer
Forum|alt.badge.img+32
  • Influencer
  • October 25, 2016

A stringsearcher with regex such as [0-9]{4} (this assumes they are always 4 digits) to build a list of matched characters, then a list exploder to create the separate features

 

 

 


david_r
Celebrity
  • October 25, 2016
ebygomm wrote:

A stringsearcher with regex such as [0-9]{4} (this assumes they are always 4 digits) to build a list of matched characters, then a list exploder to create the separate features

 

 

 

But you'd also end up with numbers outside of the brackets, if there are any...

 

 


ebygomm
Influencer
Forum|alt.badge.img+32
  • Influencer
  • October 25, 2016
david_r wrote:
But you'd also end up with numbers outside of the brackets, if there are any...

 

 

Yes, does depend on how sure of the data format you are

 

 


gio
Contributor
Forum|alt.badge.img+15
  • Contributor
  • October 25, 2016

@darius

Connect the spreadsheet to a SQL-Executor. (i used a texfile here)

Then Create following script.

It creates attributes from all hits of a regular expression.( By assigning them levels)

The purple bits you can edit as you wish/need.

A shame the InlineQuerier does nto recognise regexp_substr etc.. Then yu would not need a database connection for this. AS we simple use dual here.


gio
Contributor
Forum|alt.badge.img+15
  • Contributor
  • October 25, 2016

into attributes.. (i created random text and random typed numbers after the first couple b.t.w.)

@darius


david_r
Celebrity
  • October 26, 2016
gio wrote:

@darius

Connect the spreadsheet to a SQL-Executor. (i used a texfile here)

Then Create following script.

It creates attributes from all hits of a regular expression.( By assigning them levels)

The purple bits you can edit as you wish/need.

A shame the InlineQuerier does nto recognise regexp_substr etc.. Then yu would not need a database connection for this. AS we simple use dual here.

That's a cool solution. Just be aware, as for the other one, that this will also return any numbers occurring outside of the bracketed list.

Forum|alt.badge.img
  • Author
  • October 26, 2016
david_r wrote:

referencesfinder.fmwtGiven an input attribute "text", this sequence of transformers will give you one feature per reference with the attribute "Reference_item" containing a reference number.

You could then consider using a DuplicateFilter (DuplicateRemover before FME 2016) and/or a Sorter before proceeding.

Sample workspace is attached.

Thanks for this doing this so throughly. It worked really well!

 


Forum|alt.badge.img
  • Author
  • October 26, 2016
ebygomm wrote:

A stringsearcher with regex such as [0-9]{4} (this assumes they are always 4 digits) to build a list of matched characters, then a list exploder to create the separate features

 

 

 

Not sure if there are numbers outside the square brackets. The spreadsheet is quite large, so numbers that are not references will be extracted. References are always in square brackets. However, I will use the regex expression from @david_r, which works very well.

 


Forum|alt.badge.img
  • Author
  • October 26, 2016
gio wrote:

@darius

Connect the spreadsheet to a SQL-Executor. (i used a texfile here)

Then Create following script.

It creates attributes from all hits of a regular expression.( By assigning them levels)

The purple bits you can edit as you wish/need.

A shame the InlineQuerier does nto recognise regexp_substr etc.. Then yu would not need a database connection for this. AS we simple use dual here.

Thanks gio. As @david_r said, this is a clever solution. However, I am not sure if numbers ocurr outside the square brackets.

 


gio
Contributor
Forum|alt.badge.img+15
  • Contributor
  • October 26, 2016

@darius

Of course, it would. This is just to show the technique.

It is not to show how to create regular expressions.

And example text posted does not show any numbers outside brackets.

Anywaýs':

(?=[;\\[]*)\\b\\d+\\b(?=[;\\]])

all numbers between brackets and or semicolons.


ebygomm
Influencer
Forum|alt.badge.img+32
  • Influencer
  • October 26, 2016
gio wrote:

@darius

Of course, it would. This is just to show the technique.

It is not to show how to create regular expressions.

And example text posted does not show any numbers outside brackets.

Anywaýs':

(?=[;\\[]*)\\b\\d+\\b(?=[;\\]])

all numbers between brackets and or semicolons.

 

This regex would also work within the stringsearcher

Cookie policy

We use cookies to enhance and personalize your experience. If you accept you agree to our full cookie policy. Learn more about our cookies.

 
Cookie settings