Is it possible to stop further upstream processing once the number of features sampled matches the sampling rate when using First N Features?
A Terminator attached to the non-sampled port maybe?
Could you explain a bit more what you're trying to achieve, it's not quite clear to me.
Could you use the "Max Features to Read" parameter instead of the Sampler?
However... if you've reached your sample size and there's still features downstream they will be processed up to the Sampler anyway and there's no stopping that
However... if you've reached your sample size and there's still features downstream they will be processed up to the Sampler anyway and there's no stopping that
What is your Source Format? If it is a database you could use the WHERE clause to restrict the reading. If not, could you use a FeatureReader, using the restriction and "Max Features to Read" parameter to check the criteria and sample size?
We need a little more information, @egomm
Features are read into the workspace using an SQL creator, a statistics calculator is then used to create a cumulative total summing one of the attributes, a tester then passes features where the cumulative total is less than a set value and the sampler then should return the first x of these records
Features are read into the workspace using an SQL creator, a statistics calculator is then used to create a cumulative total summing one of the attributes, a tester then passes features where the cumulative total is less than a set value and the sampler then should return the first x of these records
Something like this:
Select * from table where value < (Select sum(value) from table) and rownum < limit
Something like this:
Select * from table where value < (Select sum(value) from table) and rownum < limit
Select * from table t1 where t1.value < (Select sum(t2.value) from table t2 where t2.value <= t1.value) and rownum < limit order by t1.value
Still fairly simple.
If you intend to use the same sample set again, then why not create a workspace that samples the data and writes the samples to an FFS file? Then use the FFS file as the source data in your main workspace. Of course, it's only saving time if you're going to use the sample dataset multiple times.
If you intend to use the same sample set again, then why not create a workspace that samples the data and writes the samples to an FFS file? Then use the FFS file as the source data in your main workspace. Of course, it's only saving time if you're going to use the sample dataset multiple times.
I completely agree. Using the sampler with a large data set wastes a lot of time since it reaches the sample rate then continues to port the rest of the data set through the NotSampled port. I would like to see the sampler transformer updated to have an option to stop reading further records/features when the sampled limit is reached. Surely this would be a simple improvement for SAFE to implement.
A Terminator attached to the non-sampled port maybe?
Could you explain a bit more what you're trying to achieve, it's not quite clear to me.
The Terminator worked for me, somewhat. I needed the first 100 records from a file with millions of records and it stopped reading at 100,000 .
Reply
Enter your username or e-mail address. We'll send you an e-mail with instructions to reset your password.