Skip to main content

I am just one week into FME (a friend recommended it to me)

I want to loop through a 761,000 rows by 25 columns excel document, for the each row

  1. count columns with values>0 and store in a column called count
  2. for the values > 0, add them and store in a new column called anim_loc
  3. return the names of the column title that has values greater than zero and store in a column called anim_names

Please how can i achieve these with FME.

I think the StatisticsCalculator transformer will be your friend after reading the Excel file using the Excel reader.

For filtering out values of 0 you want to use the Tester transformer.


@whizzkid247

Can do it like this.

excel-non-empty-column-counting.png


You want to check a lot of features (25*761000). I think the best way to solve this is using a Python script within a PythonCaller transformer.

I have been looking into a solution using an AttributeExploder and a Aggregator and it works quite quickly with the sample file, like @gio suggested as well. However I used a cloner to reproduce the same number of features and it will take like an hour or more to process the 761.000 features.

Personally I am not so good with Python, but maybe someone else can help with that.

Also I have been looking for a solution within Excel, and the first two columns can be filled very easy by using the COUNTIF(range, criteria) function for Count and SUM() function for Anim_loc. I am not sure how to get the list of Anim_names.


@whizzkid247

Can do it like this.

excel-non-empty-column-counting.png

Thanks i am still trying this out. I am new to FME so i need to look at the documentation as well as watch videos on how to set the parameters

 

 


@whizzkid247

Can do it like this.

excel-non-empty-column-counting.png

@gio I do not understand where to set the TEST in the Tester transformer.

 

i.e. How do i input in the Test Clauses?

 

TEST @Value(_attr_name) NOT_LIKE fme_%

 

TEST @Value(_attr_name) NOT_LIKE xlsx_%

 

TEST @Value(_attr_name) NOT_= xlsx_row_id

 

TEST @Value(_attr_value) NOT_= 0

 


You want to check a lot of features (25*761000). I think the best way to solve this is using a Python script within a PythonCaller transformer.

I have been looking into a solution using an AttributeExploder and a Aggregator and it works quite quickly with the sample file, like @gio suggested as well. However I used a cloner to reproduce the same number of features and it will take like an hour or more to process the 761.000 features.

Personally I am not so good with Python, but maybe someone else can help with that.

Also I have been looking for a solution within Excel, and the first two columns can be filled very easy by using the COUNTIF(range, criteria) function for Count and SUM() function for Anim_loc. I am not sure how to get the list of Anim_names.

@lars_de_vries yes i can easily do the first two columns with excel, but i face two challenges;

 

First, excel crashes because of the size of the data, and

 

second, the most important column to my work is the last column ie. Anim_names

 

 

could you point me to a tutorial on how to use the pythonCaller transformer?

 


Hi @whizzkid247, I've attached a workspace to my post here. In it, I've calculated the count values with the Aggregator, the Anim_loc values using FME's ability to utilize Excel functions, and the Anim_names attribute with lists.

 

 

looping-excel.fmw

Hi @whizzkid247, I've attached a workspace to my post here. In it, I've calculated the count values with the Aggregator, the Anim_loc values using FME's ability to utilize Excel functions, and the Anim_names attribute with lists.

 

 

looping-excel.fmw
@TiaAtSafe exactly what i needed. Thank you so much!!

 

 

The count is not working properly (it was supposed to count values that are greater than zero). I can do this in excel.

 

 

Thank you again!

 


@lars_de_vries yes i can easily do the first two columns with excel, but i face two challenges;

 

First, excel crashes because of the size of the data, and

 

second, the most important column to my work is the last column ie. Anim_names

 

 

could you point me to a tutorial on how to use the pythonCaller transformer?

 

The python caller documentation is here: https://docs.safe.com/fme/2017.0/html/FME_Desktop_Documentation/FME_Transformers/Transformers/pythoncaller.htm

 

 

If you are in FME and add any transformer to the canvas (the large white area) you can then right click on it and select Help to bring up documentation on that transformer, which is often very good at explaining what it does.

 

 


Hi @whizzkid247,

 

 

I saw your remark on the Workspace provided by @TiaAtSafe. Hereby I provide you my solution, where the count does work. Still you need to take into account that for the full amount you stated in your question, the workspace will run over an hour.

 

It also shows that there are many solutions to the same problem.

 

xlsx-read-explode-and-aggregate-v000a.fmw

 


Hi @whizzkid247,

 

 

I saw your remark on the Workspace provided by @TiaAtSafe. Hereby I provide you my solution, where the count does work. Still you need to take into account that for the full amount you stated in your question, the workspace will run over an hour.

 

It also shows that there are many solutions to the same problem.

 

xlsx-read-explode-and-aggregate-v000a.fmw

 

Thank you @lars_de_vries
Hi @whizzkid247,

 

 

I saw your remark on the Workspace provided by @TiaAtSafe. Hereby I provide you my solution, where the count does work. Still you need to take into account that for the full amount you stated in your question, the workspace will run over an hour.

 

It also shows that there are many solutions to the same problem.

 

xlsx-read-explode-and-aggregate-v000a.fmw

 

@lars_de_vries Thank you once more for your assistance! Please could you explain what this (, ){2,} and (^, |, $) represents in the text to replace field of both StringReplacerParameters?

 

 


Hi @whizzkid247,

 

 

I saw your remark on the Workspace provided by @TiaAtSafe. Hereby I provide you my solution, where the count does work. Still you need to take into account that for the full amount you stated in your question, the workspace will run over an hour.

 

It also shows that there are many solutions to the same problem.

 

xlsx-read-explode-and-aggregate-v000a.fmw

 

Hi @whizzkid247,

 

The (,) {2,} means two or more instances of ', '. The (^, |, $) means starts with ', ' or ends with ', '. The '|' sign means or '^' means "begins with" and '$' means "ends with" .

 

 


Reply