I have a file with a huge number of feature (points) and I want to affect random values for them. I use the TrueRandomIntGenerator transformer to do that, but this transformer is limited (no more than 10000 feature). How can I do to handle the file by blocks of 10000 elements?
Hello,
Page 1 / 1
You could use more than one transformer and split your feature.
It could actually handle more than 10,000 features (as there is no limit built-in to the transformer, it seems), but since the transformer uses a service called Random, that will cause the actual limitation. These limitations are mentioned in the link in the transformer's documentation.
The number of random integers you can generate depends entirely on the settings of the transformer. I just tried it out with 10,000 features, using base 10, 0 as minimum and 4,000,000 as maximum and I have now used up almost half my bit allowance (1,000,000 originally), so I could even run it again probably.
The problem here is you cannot fool that service (since it registers incoming requests by IP address), so you either need to:
- Wait until your bit allowance has been topped up (+200,000 daily);
- Divide your requests over different machines (each with their own IP - might need to have different (sub)networks too), although that probably is illegal if they find out;
- Buy credits on Random.org so you can make all the requests you need.
It could actually handle more than 10,000 features (as there is no limit built-in to the transformer, it seems), but since the transformer uses a service called Random, that will cause the actual limitation. These limitations are mentioned in the link in the transformer's documentation.
The number of random integers you can generate depends entirely on the settings of the transformer. I just tried it out with 10,000 features, using base 10, 0 as minimum and 4,000,000 as maximum and I have now used up almost half my bit allowance (1,000,000 originally), so I could even run it again probably.
The problem here is you cannot fool that service (since it registers incoming requests by IP address), so you either need to:
- Wait until your bit allowance has been topped up (+200,000 daily);
- Divide your requests over different machines (each with their own IP - might need to have different (sub)networks too), although that probably is illegal if they find out;
- Buy credits on Random.org so you can make all the requests you need.
Very good answer. The only thing I'd add is that the user could use the plain RandomNumberGenerator transformer instead. It won't be truly as random as the web service, but it won't have the same limitations.
Hello,
As sander_s said, if you are at the point you have large enough quota at random.org to process millions of features, here's how to managed that using the TrueRandomIntGenerator transformer.
First, edit the TrueRandomIntGenerator transformer and create a user parameter for the "Parallel Process By" parameter.
Then, use the Counter-ExpressionEvaluator technique to assign a group number (_result) for each set of 10000 features and use that group number to group by in your TrueRandomIntGenerator.
Assuming that you have a valid quota for random.org, you'll be able to process as many features as you need.
Larry
Hello,
As sander_s said, if you are at the point you have large enough quota at random.org to process millions of features, here's how to managed that using the TrueRandomIntGenerator transformer.
First, edit the TrueRandomIntGenerator transformer and create a user parameter for the "Parallel Process By" parameter.
Then, use the Counter-ExpressionEvaluator technique to assign a group number (_result) for each set of 10000 features and use that group number to group by in your TrueRandomIntGenerator.
Assuming that you have a valid quota for random.org, you'll be able to process as many features as you need.
Larry
Thanks for adding step 2 :)
Hi,
If the requirement allows using pseudo-random numbers, Python scripting could also be a workaround.
Takashi
Thank you for all your answers. I appreciate your reactivity and I am going to try hard to put into practice your advices.
Hi,
If the requirement allows using pseudo-random numbers, Python scripting could also be a workaround.
Takashi
Agreed, and it will also be a LOT faster, compared to fetching series from random.org.