Question

Creating a sample based on a length attribute


Badge

Hi there,

 

I'd like to find out if there is a way to generate a sample of my data based on a measurement instead of a count?

The Sampler transformer seems to do the calculations based on counts of features, but what I need is, for example, say I have 1000 kilometers of road geometry, and I want a random sample of 50 km.

Is this possible?

 

Thanks in advance,


8 replies

Userlevel 2
Badge +17

Hi @robbie_botha, if I understood your requirement correctly, the RandomNumberGenerator and the Snipper might help you.

Create a random number between 0 and 950 (= 1000 - 50), then snip out a 50 length part from the random number's position on the line.

Userlevel 4

If you don't want to alter (snip) the geometry you can use an AttributeCreator and a Tester to only let the first total 50 km of road segments to pass through, e.g.

You can then e.g. test for _total_length < 50000 in the Tester.

Badge

Hi @takashi

I think what I failed to mention is that it is not a single line feature. It is an entire network of several roads, which are in total 1000km. I want to sample a few random roads that have a total combined length of 50km.

My apologies, I should have added it to the original post. It is important that I keep the geometry in the state it is now, so I'm not sure snipping will work.

Userlevel 4

If you don't want to alter (snip) the geometry you can use an AttributeCreator and a Tester to only let the first total 50 km of road segments to pass through, e.g.

You can then e.g. test for _total_length < 50000 in the Tester.

If you want to randomize the data, you could insert a Sampler before the AttributeCreator, set to randomize sampling.

Badge +3

@robbie_botha

Count the total roadfeatures and use a rng to create sample selections.

@round(@rand*total_featureCount, 0).

Then use this to select features, add their length till 50 is reached.

If you need a sample of some minimal size then create sets that sum to 50. (Query the superset for your minimal count set. You probably need to use TCL or some lesser snake to do it. And don't output the superset.)

 

Common misunderstanding: if you read a dataset and mutilate it in some fme process... the original dataset is still were it was, in its original state. Till you overwrite it!

 

Badge +3

Here is a TCl powerset version I made. (with the help of math sites of course..)

This one is iterative. (I failed to get the recursive version with inner summing/counting to work and di nto pursue it any further)

Stuff it in a TCL-Caller.

 

This one counts objects. You can easily adapt it to sum lengths.

change {[ladd $pset_cnt] <= 100} to do that.

@Takashi can probably do that in a sec..

 

proc test {} {

 

##objectenlijst ophalen

 

set list items{}

 

##fme lists inlezen naar tcl lists

 

for {set i 0} {[FME_AttributeExists Object_list{$i}.OBJECTID]} {incr i} {

 

lappend items_id [FME_GetAttribute Object_list{$i}.OBJECTID]}

 

for {set i 0} {[FME_AttributeExists Object_list{$i}.count]} {incr i} {

 

lappend items_cnt [FME_GetAttribute Object_list{$i}.count]}

 

set Subsets_out [lindex [powersetb $items_id $items_cnt] 0]

 

set Subsets_count [lindex [powersetb $items_id $items_cnt] 1]

 

set Subsets_itemcount [lindex [powersetb $items_id $items_cnt] 2]

##Aan criteria valdoane Subsetlijst naar fme list schrijven

 

for {set i 0} {$i<[llength $Subsets_out]} {incr i} {

 

FME_SetAttribute Subset_list{$i}.Subset [lindex $Subsets_out $i]}

 

for {set i 0} {$i<[llength $Subsets_out]} {incr i} {

 

FME_SetAttribute Subset_list{$i}.count [lindex $Subsets_count $i]}

 

for {set i 0} {$i<[llength $Subsets_out]} {incr i} {

 

FME_SetAttribute Subset_list{$i}.itemcount [lindex $Subsets_itemcount $i]}

 

 

##subsets in aparte lists stoppen??

 

 

}

 

 

##powerset

 

proc powersetb {set count} {

 

set res {}

 

set res_cnt {}

 

set res_itemcount {}

 

for {set i 0} {$i < 2**[llength $set]} {incr i} {

 

set pos -1

 

set pset {}

 

set pset_cnt {}

 

set pset_itemcount {}

 

foreach el $set {

 

if {$i & 1<<[incr pos]} {

 

lappend pset $el

 

lappend pset_cnt [lindex $count [lsearch $set $el]]

 

}

}

 

if {[ladd $pset_cnt] <= 100} {

 

lappend res $pset

 

lappend res_cnt $pset_cnt

 

##aantal elementen per subset tellen

 

lappend res_itemcount [expr {[lsearch $pset [lrange $pset end end]] + 1}]

 

}

 

}

 

##alle items van de uitvoerlist uitvoeren in een samengestelde list

 

return [list $res $res_cnt $res_itemcount]

 

}

proc ladd {l} {

 

set total 0

 

foreach nxt $l {

 

incr total $nxt

 

}

 

return $total

 

}

 

Badge +22

Hi @takashi

I think what I failed to mention is that it is not a single line feature. It is an entire network of several roads, which are in total 1000km. I want to sample a few random roads that have a total combined length of 50km.

My apologies, I should have added it to the original post. It is important that I keep the geometry in the state it is now, so I'm not sure snipping will work.

Do you want truly random roads, which most likely end up with several road segments that are completely unconnected to one another, or do you want a random 'region of connected roads'?

Badge

If you don't want to alter (snip) the geometry you can use an AttributeCreator and a Tester to only let the first total 50 km of road segments to pass through, e.g.

You can then e.g. test for _total_length < 50000 in the Tester.

Hi there! I am looking for something similar, but with parcels and acres. See my question here:

https://knowledge.safe.com/questions/105643/add-until-a-certain-value-is-reached.html

I need to know if there is a way to randomly select parcels that together have a combined area of 13 000 ha

Thank you!

Reply