Question

Generate Same Random number for sample of data


Badge +3

I have an odd request.

 

My user has a data set of 758 records. They want to sample 450 of these randomly.

I've completed this using a sampler with a random sampling. However, the client wants to add further columns of data to the existing random sample.If I rerun this workspace.I'll generate a new random set of data than before.

 

How do I add a random number to a row, but keep this created number so that when I run the workspace again I get the same sample as before?

 

Thank you!

N


3 replies

Badge +3

Should add that the only thing I could think of was running my workspace once, and adding a listbuilder / listconcatenator to the output and store an id of the row somewhere for re-use.

Badge +2

Create multiple columns for each row that delineate what run the numbers are from. That way you always have the values for every run in one file.

 

Example;

 

RecordIDRun1Run2Run311231991245632441233432342312286627241245684334448865976566755791

 

Badge +22

What about creating an attribute called Sampled.

Workspace will check to see if it exists, if it does then test Sampled = yes to get your random sample.

If it doesn't (ie the first time the workspace is run on the data) do a randomized sample and add the attribute to the selected features.

Reply