A StatisticsCalculator, set to group by the attribute you're analysing, should do the trick.
I think your approach (add attributes storing 1 or 0 that indicates whether the row matches a condition and then sum up them with a StatisticsCalculator) is enough simple. If you want to reduce the number of transformers, consider using an AttributeCreator or AttributeManagure with conditional value setting to create the flag attributes for every condition, rather than using ExpressionEvaluators. You can then set the attributes to the Attributes to Analyze parameter in the StatisticsCalculator.
Alternatively, if you are familiar with SQL syntax, the InlineQuerier with a SQL query that counts the number of rows matching each condition could also be a solution. The SQL query would look like this.
select
(select count(*) from source where <condition 1>) as count1,
(select count(*) from source where <condition 2>) as count2,
...
(select count(*) from source where <condition 10>) as count10
Note: Querying itself would be very fast, but the InlineQuerier takes creates a temporary SQLite database that stores entire source dataset, and it takes a certain time. I'm not sure it would be more efficient than the StatisticsCalculator approach when the number of rows from the CSV table is very large.
I think your approach (add attributes storing 1 or 0 that indicates whether the row matches a condition and then sum up them with a StatisticsCalculator) is enough simple. If you want to reduce the number of transformers, consider using an AttributeCreator or AttributeManagure with conditional value setting to create the flag attributes for every condition, rather than using ExpressionEvaluators. You can then set the attributes to the Attributes to Analyze parameter in the StatisticsCalculator.
Alternatively, if you are familiar with SQL syntax, the InlineQuerier with a SQL query that counts the number of rows matching each condition could also be a solution. The SQL query would look like this.
select
(select count(*) from source where <condition 1>) as count1,
(select count(*) from source where <condition 2>) as count2,
...
(select count(*) from source where <condition 10>) as count10
Note: Querying itself would be very fast, but the InlineQuerier takes creates a temporary SQLite database that stores entire source dataset, and it takes a certain time. I'm not sure it would be more efficient than the StatisticsCalculator approach when the number of rows from the CSV table is very large.
Thanks @takashi the attribute create method is quicker. However I want to try the inline querier, but for some reason its not generating any output. can you provide an example workbench if possible.
I think your approach (add attributes storing 1 or 0 that indicates whether the row matches a condition and then sum up them with a StatisticsCalculator) is enough simple. If you want to reduce the number of transformers, consider using an AttributeCreator or AttributeManagure with conditional value setting to create the flag attributes for every condition, rather than using ExpressionEvaluators. You can then set the attributes to the Attributes to Analyze parameter in the StatisticsCalculator.
Alternatively, if you are familiar with SQL syntax, the InlineQuerier with a SQL query that counts the number of rows matching each condition could also be a solution. The SQL query would look like this.
select
(select count(*) from source where <condition 1>) as count1,
(select count(*) from source where <condition 2>) as count2,
...
(select count(*) from source where <condition 10>) as count10
Note: Querying itself would be very fast, but the InlineQuerier takes creates a temporary SQLite database that stores entire source dataset, and it takes a certain time. I'm not sure it would be more efficient than the StatisticsCalculator approach when the number of rows from the CSV table is very large.
I would provide an example if you could share a sample data here.
I think your approach (add attributes storing 1 or 0 that indicates whether the row matches a condition and then sum up them with a StatisticsCalculator) is enough simple. If you want to reduce the number of transformers, consider using an AttributeCreator or AttributeManagure with conditional value setting to create the flag attributes for every condition, rather than using ExpressionEvaluators. You can then set the attributes to the Attributes to Analyze parameter in the StatisticsCalculator.
Alternatively, if you are familiar with SQL syntax, the InlineQuerier with a SQL query that counts the number of rows matching each condition could also be a solution. The SQL query would look like this.
select
(select count(*) from source where <condition 1>) as count1,
(select count(*) from source where <condition 2>) as count2,
...
(select count(*) from source where <condition 10>) as count10
Note: Querying itself would be very fast, but the InlineQuerier takes creates a temporary SQLite database that stores entire source dataset, and it takes a certain time. I'm not sure it would be more efficient than the StatisticsCalculator approach when the number of rows from the CSV table is very large.
Due to the large dataset the inline querier takes almost similar times as compared to AttributeCreater method with conditional value setting then using the statisticscalculator to do the sum. however the sql query takes no time.
Thanks,