Skip to main content
Question

Generate Same Random number for sample of data


nedwaterman
Contributor
Forum|alt.badge.img+9

I have an odd request.

 

My user has a data set of 758 records. They want to sample 450 of these randomly.

I've completed this using a sampler with a random sampling. However, the client wants to add further columns of data to the existing random sample.If I rerun this workspace.I'll generate a new random set of data than before.

 

How do I add a random number to a row, but keep this created number so that when I run the workspace again I get the same sample as before?

 

Thank you!

N

3 replies

nedwaterman
Contributor
Forum|alt.badge.img+9
  • Author
  • Contributor
  • June 3, 2020

Should add that the only thing I could think of was running my workspace once, and adding a listbuilder / listconcatenator to the output and store an id of the row somewhere for re-use.


Forum|alt.badge.img+2

Create multiple columns for each row that delineate what run the numbers are from. That way you always have the values for every run in one file.

 

Example;

 

RecordIDRun1Run2Run311231991245632441233432342312286627241245684334448865976566755791

 


jdh
Contributor
Forum|alt.badge.img+28
  • Contributor
  • June 3, 2020

What about creating an attribute called Sampled.

Workspace will check to see if it exists, if it does then test Sampled = yes to get your random sample.

If it doesn't (ie the first time the workspace is run on the data) do a randomized sample and add the attribute to the selected features.


Cookie policy

We use cookies to enhance and personalize your experience. If you accept you agree to our full cookie policy. Learn more about our cookies.

 
Cookie settings