Skip to main content
Question

Speeding up PointCloudFilter


todd_davis
Influencer
Forum|alt.badge.img+22

I have a pointcloud dataset of 1.8 billion points and I am filtering on a single component value. The pointcloudfiltering is taking 15 minutes and not blowing out CPU/Memory or IOPS, so it could be faster. However, the pointcloudfilter doesn't have ability to go parallel. Does anyone have an idea of how to make that run quicker...?

I also tested making the componenet spatial and then using a clip on the point cloud, but it seemed to be slightly slower in a couple of tests I ran.

5 replies

redgeographics
Celebrity
Forum|alt.badge.img+50

Other than looking at external factors (are you reading from a network or local harddisk?) I don't think there's much you can do.

Tiling the point cloud so that you can do parallel processing in a custom transformer might be an option, but the tiling might introduce more overhead so then you're really just switching the point where the slowdown happens.


todd_davis
Influencer
Forum|alt.badge.img+22
  • Author
  • Influencer
  • December 4, 2018
redgeographics wrote:

Other than looking at external factors (are you reading from a network or local harddisk?) I don't think there's much you can do.

Tiling the point cloud so that you can do parallel processing in a custom transformer might be an option, but the tiling might introduce more overhead so then you're really just switching the point where the slowdown happens.

I had tried some parallel processing before in other parts of this script and sometimes that does result in faster/slower processing dependent on the inputs, but overall didn't lead to any improvement overall.

1.8 billion points filtering in 15 minutes isn't bad either....I just want more :)

I also recalculated the x/y of each point using the component value and then setup a tiler with seed postion of 0,0 (anything tiled in the positive space would be the data I am after) and then recalc back to there original x/y. However, that is still slower than filter. In a similar vain, I also tried pointcloudmerger to merge on a positive value, but that was again slower.

My ideal answer would be, that some type of calculation in the PointCloudExpressionEvaluator that could delete a point and all its compents if not relevant, but I don't think that is possible.


fmelizard
Safer
Forum|alt.badge.img+18
  • Safer
  • December 5, 2018

The team suggests some things to consider:

- If the unfiltered points are not needed, ensure `Output Unfiltered Points` is set to `No`

- What format is the source dataset, and where is it stored? It may be that reading that is the major cost of the translation, rather than filtering. As @redgeographics pointed out, if for example the dataset is stored on the network, moving it locally could improve performance


todd_davis
Influencer
Forum|alt.badge.img+22
  • Author
  • Influencer
  • December 5, 2018
fmelizard wrote:

The team suggests some things to consider:

- If the unfiltered points are not needed, ensure `Output Unfiltered Points` is set to `No`

- What format is the source dataset, and where is it stored? It may be that reading that is the major cost of the translation, rather than filtering. As @redgeographics pointed out, if for example the dataset is stored on the network, moving it locally could improve performance

Hi,

Yes, Output Unfiltered Points is set to 'No'

The point cloud is actually created within the FME process (I am amazed how quick that part of the process is!). The data that is a derivative from is a source 30km road centreline line. That is sitting on a local M2 SSD drive with about 1500MB/s R/W and fme_temp on a dedicated M2 SSD drive with 1500 MB/s R/W.

There are several components on the point cloud, which will be adding some time...maybe I will try and remove all the other components prior to the filter then pointcloudmerge the original with the filtered following....maybe

Cheers for the thoughts.

Todd


todd_davis
Influencer
Forum|alt.badge.img+22
  • Author
  • Influencer
  • December 5, 2018
fmelizard wrote:

The team suggests some things to consider:

- If the unfiltered points are not needed, ensure `Output Unfiltered Points` is set to `No`

- What format is the source dataset, and where is it stored? It may be that reading that is the major cost of the translation, rather than filtering. As @redgeographics pointed out, if for example the dataset is stored on the network, moving it locally could improve performance

Okay, more detail...

My pointcloudmerge idea was way slower...

but jumping back to the usage....this is CPU/RAMM/DISK usage while filtering (8 virtual cores) and is very similar for the examples below....

And based on factories...pointcloudfilter is third slowest in this example that has 0.8 billion points

Interesting enough, the tester and attributecreator (running numerous functions such as @Evaluate, @Value, @abs) are way slower (in my other weak part of this process)


Cookie policy

We use cookies to enhance and personalize your experience. If you accept you agree to our full cookie policy. Learn more about our cookies.

 
Cookie settings