Skip to main content

Hi,

If possible could someone answer these general Point Cloud Questions, apologies if they are obvious?

1. What does the PointCloudConsumer actually do? It is effectively acting as a valve which restricts the maximum number of points (as specified within the block size) being read in at any one time? If this is the case, then I assume it is used to ensure that the downstream workbench processes are not overwhelmed by the size of point cloud points? However, I assume that the block size means that the whole point cloud data will be read, but effectively it is read iteratively, rather than in a single hit. If it it not doing any of these things then what does it actually do?

2. Does encoding a LAS file as a blob using the PointCloudExtractor pretty much only change the encoding; but not really ‘compress’ the data in terms of size. I have a LAS file which is about 18MB in its native format. I have read it into FME and then used the PointCloudExtractor to convert the file to a blob. I have subsequently written this to a SQLite table as a blob data type. The resultant SQLite file is about the same size, is this right? (I naively imagined for some reason that the blob would have a smaller file size. I suppose that there is no reason why it should.) Therefore, is there a way to ‘compress’ the data during the conversion and writing but without degrading the data? Namely, so that I can read the SQlite table at a later date and then use the PointCloudReplacer to convert it back to a ‘normal’ point cloud but of the same original data resolution? (Essentially, I wanted to encode loads of LAS files as blobs and store them in an SQLite database, but I think the SQLite table will still be too huge.)

Thank you in advance,

Rob

U can try it.

Take a pointcloud and clip it with some feature.

Then use a pointcloudcoercer to turn your "clips" to points.....yup entire pc comes out.

Now try a pc-consumer before the coercer. ..that's more like it!


Hi Gio,

Thanks for the answer, but I got the same result whether i used the PCConsumer or not!

I completed a test as you described; however regardless of whether I used a PointCloudConsumer before the PCCoercer I still got the same amount of points. Please see the images included

test-workbench.pngoutput-001.pngoutput-002.png

The test workbench reads in a LAS file, a clipping boundary is created (scaled 1/2 size MBR) the LAS in then clipped. From the Inside port I took 1 stream and connected it into a PCCoercer this gave me 11856 individual points, I took a parallel stream from the Inside port and first connected a PC Consumer and then into a different PCCoercer I still got 11856 individual points. to my mind the PCConsumer made no difference what so ever, have I missed something?

Image 1 - test workspace

Image 2 - org LAS file and clip box to be used

image 3 - org LAS file in background with outputs from Non -PCConsumed (blue) and PCConsumed (orange) both with the same point count

Regards,

rob


OK, I think I know what the consumer does. There is a RasterConsumer too and I think I can base my answer on what I think that does!

So, here goes... with vector data in FME each transformer processes the data which is then sent to a Writer. So the data is processed step by step.

With Raster each transformer doesn't process data! Instead it tags the data with its operation (in the form of some complex mathematical matrix), which is then sent to the Writer. The Writer then forces the processing to occur. But! Because it knows what operations are going to accumulate it can do this more efficiently.

For example, if you have a Reprojector and then a Clipper, vector data would be reprojected and then clipped. But in Raster the Writer can say - "oh, I don't need to reproject all the raster data because only a part of it will come out of the clipper anyway". So it creates the known clip area then reprojects just that small area. It's just more efficient.

So, what I think the RasterConsumer does is this. It lets you define tile boundaries. Nothing happens immediately. But when the Writer starts working it divides the data into chunks based on these tile boundaries. Then it can (for example) clip and reproject small pieces of data, which is more efficient that trying to process an entire set of data at once (eg if the data can be treated as tiles, half of the tiles can be just discarded, rather than clipping one big raster and throwing half of it away)

You won't get tiles in the output, because you've not used the Tiler. You've just said FME can divide up the data for more efficient processing. So the result will be the same, but there are possible performance improvements (the amount of which is dependant on the other transformers used).

As for raster, so for point cloud, except that it is in 3D chunks rather than 2D tiles.

Why might you not want to do this? Where each cell/point might affect the results of another. For example, if you resampled a raster in tiles using nearest neighbor resampling, you might get different results around the tile edges because the nearest neighbor is now in a different tile.

I hope this helps. And I hope I'm correct! If you do have further questions on exactly how this works, do contact our support team (safe.com/support) who are more able to go talk to the developers and experts than I am.

Regards

Mark


@Mark2AtSafe

Hi Mark,

Thanks for the reply (needless to say this has generated more questions/thoughts, if you or anyone else has ideas relating to them them I would be happy to hear them).

As said I was a bit stumped as to what the PCConsumer actually did; and the transformer description was non-descript (to me at least). Therefore, I had wondered whether it was related to a processing/performance or perhaps used to limit the number PC points which are pushed down the pipe at any one time. (As an aside I had exploded a LAS file to see how many points were contained; and to understand if the process which built would work with the volume of data (or run out of memory). So I was sat there watching the LAS file be coerced into points (I had already watched the paint dry!), the auto count was increasing on the link line; when the count reached 2 million a message was displayed in the translation log window to say that points would no longed be written to ffs the store. However, the points were still be coerced from the PC and the auto count was still increasing; eventually, all the points had been generated over 5 million. I wondered about the message about the exceeded a 2 million point limited to the ffs store. Does this mean that the from the 2000001 to the end point will still be pushed down stream and any further processes will still be applied? Or does it mean that ONLY a maximum of 2 million points will be processed in total even though points after the 2 million threshold are still coerced? (The latter point would seem crazy, as why still coerce them if they cannot be used?) So I had wondered if the PCConsumer was meant as a valve to only push the number of points downstream to be processed as a chunk before the next lot were sent (but eventually all would be pushed though). i.e. a method to circumvent the 2 million limit and also meaning that memory would not be an issue.) (Sorry for the rather long aside.)

Can I quickly ask a specific question about your answer? You provide a description of what the RasterConsumer does, and say that the RasterConsumer is chunking the data into subsets, and I presume that I am right in thinking that it is effectively creating splits based on an AREA or BOUNDARY based on rows and cols. So my question is, in the case of PCConsumer and the BLOCK SIZE parameter, does the block size mean a subset of points of fixed amount will be processed at a given time (i.e. if set at 10000, then 10000 point will be passed, then another 10000, etc.)? Or is block size a distance unit which chunks the PC into cubes of that length? If it is the latter then presumably you might have a single point in 1 cube but 1000000 in another?

Thanks,

Rob


@Mark2AtSafe

Hi Mark,

Thanks for the reply (needless to say this has generated more questions/thoughts, if you or anyone else has ideas relating to them them I would be happy to hear them).

As said I was a bit stumped as to what the PCConsumer actually did; and the transformer description was non-descript (to me at least). Therefore, I had wondered whether it was related to a processing/performance or perhaps used to limit the number PC points which are pushed down the pipe at any one time. (As an aside I had exploded a LAS file to see how many points were contained; and to understand if the process which built would work with the volume of data (or run out of memory). So I was sat there watching the LAS file be coerced into points (I had already watched the paint dry!), the auto count was increasing on the link line; when the count reached 2 million a message was displayed in the translation log window to say that points would no longed be written to ffs the store. However, the points were still be coerced from the PC and the auto count was still increasing; eventually, all the points had been generated over 5 million. I wondered about the message about the exceeded a 2 million point limited to the ffs store. Does this mean that the from the 2000001 to the end point will still be pushed down stream and any further processes will still be applied? Or does it mean that ONLY a maximum of 2 million points will be processed in total even though points after the 2 million threshold are still coerced? (The latter point would seem crazy, as why still coerce them if they cannot be used?) So I had wondered if the PCConsumer was meant as a valve to only push the number of points downstream to be processed as a chunk before the next lot were sent (but eventually all would be pushed though). i.e. a method to circumvent the 2 million limit and also meaning that memory would not be an issue.) (Sorry for the rather long aside.)

Can I quickly ask a specific question about your answer? You provide a description of what the RasterConsumer does, and say that the RasterConsumer is chunking the data into subsets, and I presume that I am right in thinking that it is effectively creating splits based on an AREA or BOUNDARY based on rows and cols. So my question is, in the case of PCConsumer and the BLOCK SIZE parameter, does the block size mean a subset of points of fixed amount will be processed at a given time (i.e. if set at 10000, then 10000 point will be passed, then another 10000, etc.)? Or is block size a distance unit which chunks the PC into cubes of that length? If it is the latter then presumably you might have a single point in 1 cube but 1000000 in another?

Thanks,

Rob

The block size is the number of points, not a unit of distance. Perhaps I should have that added to the transformer gui (I'll also suggest we improve the documentation for the consumer transformers)


For me it did work.

I clipped a roadnetwork to get height data on the roads then consumed it then coerced it, and it works neatly. (and of course i set a value for blocksize to suit my needs)

FME2015


The block size is the number of points, not a unit of distance. Perhaps I should have that added to the transformer gui (I'll also suggest we improve the documentation for the consumer transformers)

I filed PR#68554 (for documentation improvements) and PR#68555 (for updating the GUI prompt)


@Mark2AtSafe

@Gio

Mark, thanks very much for the further explanation regarding the 'dimension' of block size and thanks for following up with the document improvement.

Gio, thanks also for running another test, i'm am not quite sure why PCConsumer 'appears' to make no difference in my test, I will recheck what I did. Thanks again.

Regards,

Rob


@rob14

Hi here is example of functioning consumer fme2015>

Clipper

part of the workbench:

Result after coersion:

these are 2.7mil points.

Also having sufficient memory is pretty handy. This is on a system with 16Gb memory. THis proces is 12 + min proces with 9Gb peak memory ussage.

I never even see the total amount of points in the processed pc in the workbench.


To extract height centerlines:


@gio

Hi Gio,

Thanks for the further screenshots, this looks like some interesting work with a nice output..

I will spend some more time looking at this within the context of the workbench I created.

regards,

Rob


OK, I think I know what the consumer does. There is a RasterConsumer too and I think I can base my answer on what I think that does!

So, here goes... with vector data in FME each transformer processes the data which is then sent to a Writer. So the data is processed step by step.

With Raster each transformer doesn't process data! Instead it tags the data with its operation (in the form of some complex mathematical matrix), which is then sent to the Writer. The Writer then forces the processing to occur. But! Because it knows what operations are going to accumulate it can do this more efficiently.

For example, if you have a Reprojector and then a Clipper, vector data would be reprojected and then clipped. But in Raster the Writer can say - "oh, I don't need to reproject all the raster data because only a part of it will come out of the clipper anyway". So it creates the known clip area then reprojects just that small area. It's just more efficient.

So, what I think the RasterConsumer does is this. It lets you define tile boundaries. Nothing happens immediately. But when the Writer starts working it divides the data into chunks based on these tile boundaries. Then it can (for example) clip and reproject small pieces of data, which is more efficient that trying to process an entire set of data at once (eg if the data can be treated as tiles, half of the tiles can be just discarded, rather than clipping one big raster and throwing half of it away)

You won't get tiles in the output, because you've not used the Tiler. You've just said FME can divide up the data for more efficient processing. So the result will be the same, but there are possible performance improvements (the amount of which is dependant on the other transformers used).

As for raster, so for point cloud, except that it is in 3D chunks rather than 2D tiles.

Why might you not want to do this? Where each cell/point might affect the results of another. For example, if you resampled a raster in tiles using nearest neighbor resampling, you might get different results around the tile edges because the nearest neighbor is now in a different tile.

I hope this helps. And I hope I'm correct! If you do have further questions on exactly how this works, do contact our support team (safe.com/support) who are more able to go talk to the developers and experts than I am.

Regards

Mark

So, my thoughts were not quite correct. What the consumer does is force FME to read the data at that point (presumably tiling it at the same time). Normally the reading of data would be part of the writing process I believe (for the above mentioned performance reasons - there's no point reading data you know you don't need). The consumer forces FME to read the data mid-translation. So... I don't really know that is much different in terms of outcome. Basically it's not something you're likely to ever need, and I don't think it would affect the output. I could make a good case that we should remove or hide this transformer from users, to avoid this sort of confusion.


Reply