The basic workflow is:Read in featuresCount them (via StatisticsCalculator, Total Count)Depending on count value, calculate sample valueRandom sample features based on calculated sample valueSampler doesn't allow to set Sampling Rate (N) as attribute value. Maybe there is a workaround?

Solved

Sampling rate from attribute value

4 years ago
October 19, 2020
5 replies
163 views

fikusas
Contributor
75 replies

The basic workflow is:

Read in features
Count them (via StatisticsCalculator, Total Count)
Depending on count value, calculate sample value
Random sample features based on calculated sample value

Sampler doesn't allow to set Sampling Rate (N) as attribute value. Maybe there is a workaround?

Best answer by ebygomm

As an alternative, you could create a random number on each attribute, sort by this attribute, then a counter, then a tester to only keep features where count is less than the sample size

View original

Did this help you find an answer to your question?

+49

redgeographics
Celebrity
3618 replies
4 years ago
October 19, 2020

No, that won't work. The Sampler requires the sampling rate to be a single value for all features. If you use attributes there is the potential that they have different values, which can lead to all kinds of issues.

I thought maybe wrapping the Sampler in a Custom Transformer and exposing the Sampling Rate parameter as a parameter of the Custom Transformer would work, but no...

There is the option to use a User Parameter for the sampling rate, but if you want to use that you'd have to cut the process in to two parts, one to perform steps 1-3 from your list and then call a 2nd workspace, using a WorkspaceRunner, to do part 4 using a User Parameter as input. The downside is that you're reading all of your data twice.

fikusas
Author
Contributor
75 replies
4 years ago
October 19, 2020

redgeographics wrote:

I thought maybe wrapping the Sampler in a Custom Transformer and exposing the Sampling Rate parameter as a parameter of the Custom Transformer would work, but no...

In my case, the sample size attribute always will have same values.

Yeah, 2 workspace method is the least favorite and te last attempt if no other method works. But maybe it is possible to pass sample size attribute value as sampling rate via Python or smth.

+32

ebygomm
Influencer
3256 replies
Best Answer
4 years ago
October 19, 2020

As an alternative, you could create a random number on each attribute, sort by this attribute, then a counter, then a tester to only keep features where count is less than the sample size

jlbaker2779
194 replies
4 years ago
October 19, 2020

Count <= sample size in a tester would give you the sample size. So, generate a random number, sort, and then get the 1st two records of the sorted random numbers.

Sample

+32

ebygomm
Influencer
3256 replies
4 years ago
October 19, 2020

If you did want to go down the python route

import fme
import fmeobjects
import random
 
class FeatureProcessor(object):
    def __init__(self):
        #create empty list
        self.feature = []
        self.i = 0
    def input(self,feature):
        #set samplesize based on value from first feature entering transformer
        if self.i == 0:
            self.n = feature.getAttribute('samplesize')
            self.i+=1
        #add each feature to list   
        self.feature.append(feature)
 
    def close(self):
        #shuffle list of features
        random.shuffle(self.feature)
        if len(self.feature)<self.n:
            print ("Error: sample size greater than number of features")
        else:
        #return 1st n amount of features
            for x in range(0,self.n):
                self.pyoutput(self.feature[x])

but in this circumstance I would go with a random number generator, sorter, counter, tester option

Reply

Rich Text Editor, editor1

Cookie policy

We use cookies to enhance and personalize your experience. If you accept you agree to our full cookie policy. Learn more about our cookies.

Cookie settings

We use 3 different kinds of cookies. You can choose which cookies you want to accept. We need basic cookies to make this site work, therefore these are the minimum you can select. Learn more about our cookies.

Basic
Functional

Normal
Functional + analytics

Complete
Functional + analytics + social media + embedded videos + marketing

Sampling rate from attribute value