Solved

Save number of input files into an attribute


Badge

I would like to count the number of input files and return this number to be able to use it in an arithmetic calculations afterwards.

To be more precise, I have X Excel files, each containing values that are regularly repeated, but each with a different frequency. I calculated how many times each value occurred in all the files (so I have 2 attributes: value and count), and now I would like to divide each count by X (number of files) to get the frequency as a 3rd attribute.

I tried using Aggregator, Statistics Calculator and Counter to get the number of input files, but I get 'missing' or 'null' values when I try to save it as an attribute. How can I 'save' or return this number?

icon

Best answer by redgeographics 27 May 2017, 06:24

View original

6 replies

Userlevel 5
Badge +25

Interesting question, I think this would do the trick:

Expose the multi_reader_full_id attribute, this is a counter on the indivdual input files. Sampler to let only the first record of every file be processed, StatisticsCalculator to count them and a FeatureMerger to join it back to the original records. Hope this helps.

xlsxr2none.fmwt

Badge

Why not use the reader 'directory and file pathnames'? Then just use the counter so you have the amount of files?

Badge

Thank for your answers. I'm afraid it all leads to the same problem. I think that FeatureMerger is the right approach, but after reading my Excel files I get 7k+ outputs (= rows). Then I run a few transformers on these rows and get about 200 outputs. Now the question is how do I 'attach' the number of files to these 200 values? As far as I understand, there are multiple ways to count the files. I just can't figure out what to do to be able to add an attribute with a constant value equal to the number of files? Would it be useful to create some dummy attribute to be able to merge based on its value?

Badge

Many possibilities exist. I think the 'Statistics Calculator' will do the trick. Connect your 200 output features as an input to this transformer. First set the parameter 'Attributes to Analyze' to a unique identifier or an attribute that contains unique values for each record. This is important, as the statistics will be calculated based on the values in this attribute! If no such field is present, a counter could create a unique number for each row/record.

Then specify a name for the 'Total Count Attribute' or 'Numeric Count Attribute'. In this case, they will do the same thing. Finally, have a look at the 'Complete' output port of this transformer. A field '_count' will be created (unless you specified a different name off course), containing the total number of features that were counted.

 

 

Is that what you are looking for?

Userlevel 5
Badge +25

Thank for your answers. I'm afraid it all leads to the same problem. I think that FeatureMerger is the right approach, but after reading my Excel files I get 7k+ outputs (= rows). Then I run a few transformers on these rows and get about 200 outputs. Now the question is how do I 'attach' the number of files to these 200 values? As far as I understand, there are multiple ways to count the files. I just can't figure out what to do to be able to add an attribute with a constant value equal to the number of files? Would it be useful to create some dummy attribute to be able to merge based on its value?

I should have pointed this out I guess. I am using the FeatureMerger set to 1 and 1 for both requestor and supplier. I assume you want all features to know how many files were read so that's why I opted for this.

 

 

Badge

I ended up with a AttributeExposer, Aggregator and StatisticsCalculator to count the files and another part of the script to analyse my data. Then I created a dummy attribute with identical values at the end of both parts of the script to be able to merge them with a FeatureMerger. Not sure it's the best way, but it works.

Thanks for your help

Reply