Skip to main content

Hi All, i have 1,000,000 lines (flow paths) without correct flow direction. I also have 29,000,000 small line segments (parts of the flow paths) with correct flow direction in a separate layer. I want to apply the flow direction from the 29,000,000 to the 1,000,000.

Here is my approach.

Use neighbourfinder set to neighbours to find = find 1 and max distance =0m . This drops my 29,000,000 down to 1,000,000. Now i have 1,000,000 small segments (with correct direction) that lay on top of the 1,000,000 flow paths that have incorrect flow direction . How do i check if my flow path is oriented in the same way as the small segment that lays on it and if not flip it.

I have tried extracting the geometry of the small section before i use the neighbour finder, so the 1,000,000 flow paths now have 2 geoms (original and small section geom), then how do i compare direction of one geom to the other then test is direction A = direction B, if not use orienter to flip.

Maybe The PolylineAnalyzer ?

Maybe the _candidate_angle from the Neighbourfinder ?

Maybe my approach is wrong for a dataset this size ?

Thanks for suggestions

Steve

I would try something like extracting the first vertex of the reference segment (RS) as an attribute, send both the flow paths (FP) and RS to a LineOnLineOverlayer with merge attributes, Tester to get only features with overlap of 2, Tester to see the feature's first vertex matches the RS vertex (if yes the FP is oriented correctly, if no the FP needs to be flipped).

 

You will probably want to extract the geometry of the FP, and restore it after the LineOnLineOverlayer (but prior to the Orientor, making sure to remove duplicates if there is more than one RS per FP.


Hi @goatboy, if it is guaranteed that the small segments cover the flow paths in full, I would try:

  1. Bufferer: Create buffers of the flow paths with a slight buffer amount.

  2. SpatialFilter: Send the buffers to the Filter port, send the small segments to the Candidate port, set Contains to 'Tests to Perform' parameter, check 'Merge Attributes' option, and set Only Use Filter to 'Accumulation Mode' parameter.

  3. LineJoiner: Set an appropriate attribute that is unique for each flow path to 'Group By' parameter, and set yes to 'Preserve Original Orientation' parameter.

I expect that the lines output from the LineJoiner are your required flow paths with correct orientation.


Hi @goatboy, if it is guaranteed that the small segments cover the flow paths in full, I would try:

  1. Bufferer: Create buffers of the flow paths with a slight buffer amount.

  2. SpatialFilter: Send the buffers to the Filter port, send the small segments to the Candidate port, set Contains to 'Tests to Perform' parameter, check 'Merge Attributes' option, and set Only Use Filter to 'Accumulation Mode' parameter.

  3. LineJoiner: Set an appropriate attribute that is unique for each flow path to 'Group By' parameter, and set yes to 'Preserve Original Orientation' parameter.

I expect that the lines output from the LineJoiner are your required flow paths with correct orientation.

Many Thanks Takashi, yes your are correct with your assumption.segments cover entire flow path Your process works well. I have tried it with 10 records. now i will throw 30,000,000 at it and wait for the magic blue smoke to rise out of my computer..... thanks again

Steve


Many Thanks Takashi, yes your are correct with your assumption.segments cover entire flow path Your process works well. I have tried it with 10 records. now i will throw 30,000,000 at it and wait for the magic blue smoke to rise out of my computer..... thanks again

Steve

oh, the data is so large. Try: adjust the order of the readers in the Navigator pane to send all the flow paths (Filter) to the SpatialFilter before the small segments (Candidate), and set Filters First to the 'Filter Type' parameter, so that the memory usage could be reduced.


oh, the data is so large. Try: adjust the order of the readers in the Navigator pane to send all the flow paths (Filter) to the SpatialFilter before the small segments (Candidate), and set Filters First to the 'Filter Type' parameter, so that the memory usage could be reduced.

Thanks Takashi, I will strip any unneeded attributes out early in the process to try and conserve memory. I will also try Filters first option.

I will buffer into a staging ffs file also i think. Its the spatial filter that will take some time to process......

i will avoid the parallel processing options as each of my 1,000,000 groups has <29 records (ie 29,000,000 so the overhead of spawning sub processes will to costly i think).

Thoughts ?

 

Steve

 


Thanks Takashi, I will strip any unneeded attributes out early in the process to try and conserve memory. I will also try Filters first option.

I will buffer into a staging ffs file also i think. Its the spatial filter that will take some time to process......

i will avoid the parallel processing options as each of my 1,000,000 groups has <29 records (ie 29,000,000 so the overhead of spawning sub processes will to costly i think).

Thoughts ?

 

Steve

 

Hi @goatboy, I agree that the parallel processing for 1 million groups is not preferable.


Probably the SpatialFilter with the 'Filters First' option performs the process with the best effort. However, the following LineJoiner for the 29 million segments could be too memory intensive.

If you can make groups of flow paths, it might be worth to try the parallel processing in the LineJoiner by the group. I don't know the optimal number of groups, but it would not be so many. Tens or so?


Thanks Takashi, I will strip any unneeded attributes out early in the process to try and conserve memory. I will also try Filters first option.

I will buffer into a staging ffs file also i think. Its the spatial filter that will take some time to process......

i will avoid the parallel processing options as each of my 1,000,000 groups has <29 records (ie 29,000,000 so the overhead of spawning sub processes will to costly i think).

Thoughts ?

 

Steve

 

I think the bottleneck in the process is that the LineJoiner has to cache all the 29 million segments. How about using a spatial database to resolve that?

  • In the current workspace, save the all segments output from the SpatialFileter into a database table.

  • Create another workspace which reads the segment features ordered by the path identifier (i.e. the Group By attribute for the following LineJoiner), using a SQL statement (SQLCreator or SQLExecutor).

  • Then, connect the LineJoiner to the SQLCreator/Executor. Set the Group By parameter, and also set Yes to the 'Input is Ordered by Group' parameter.

Hi @goatboy, if it is guaranteed that the small segments cover the flow paths in full, I would try:

  1. Bufferer: Create buffers of the flow paths with a slight buffer amount.

  2. SpatialFilter: Send the buffers to the Filter port, send the small segments to the Candidate port, set Contains to 'Tests to Perform' parameter, check 'Merge Attributes' option, and set Only Use Filter to 'Accumulation Mode' parameter.

  3. LineJoiner: Set an appropriate attribute that is unique for each flow path to 'Group By' parameter, and set yes to 'Preserve Original Orientation' parameter.

I expect that the lines output from the LineJoiner are your required flow paths with correct orientation.

Thanks Takashi, The bottle neck is the SpatialFilter currently. I have a powerful pc 12gigs ram 64 bit on SSDs. I have loaded all input data into .ffs to speed reading and writing . The spatialfilter is doing about 6 records a second..... which equates to 55 days to process the 29million. yikes..... I might try jdh's answer to see if it performs any better. Thanks Again, Steve


Reply