Question

What is faster? Reading in everything first or writing as you go along?

  • 12 October 2016
  • 5 replies
  • 13 views

Badge

Hi FMErs,

Here's a question that I should probably know the answer to already (but don't!).

I am performing multiple clips on multiple point clouds. They are Grouped by an attribute in the Clipper, so the Clipper knows which Clippee point cloud(s) to clip. At the moment, the workspace is performing the clips then storing them all in memory. Once every clip is received, the Writer fan-outs on the Clipper name to separate files named after each Clipper. It works fine.

If I were to retool the workspace (using the FeatureReader or Workspace Runner for example), so each Clip wrote out as it was completed, would this be a faster workspace or would be exactly the same? So, the underlying question is, does FME slow down when storing data in memory before it writes, as opposed to writing as it goes along, and therefore clearing out the memory in tandem?

Note, my temp location is set to a fast (class 10) SD card plugged into my laptop. The workspace seems to utliise around a 1/4 of my RAM (4gb of 16gb), so I'm not running low on RAM.

Thank you!


5 replies

Userlevel 4

It all depends, there are too many factors to account for to give an authoritative answer, but my general answer would be that it is probably(!) faster to write data in blocks as you go along.

Reading and storing data in memory before writing it out can be problematic if FME can't allocate enough memory and starts to swap memory to disk (look for the infamous "ResourceManager: Optimizing Memory Usage. Please wait..." in the log). Also be aware that the 32-bit version of FME (or any other program, really) can't use more than a few GB of RAM. You might want to check if that is the case you describe above. You might also want to analyze the Windows performance logs to check that the operating system isn't swapping physical memory (RAM) out to virtual memory (disk).

As for how the data is written, there are a number of factors such as transaction size (for the formats that support transactions), network speed (if your database is on a remote server), etc. Some of those factors will also vary depending on your output format.

The only reliable proof is to try both scenarios with your proper data and on your proper servers and see what the last line in the log file says.

Userlevel 4
Badge +13

It all depends, there are too many factors to account for to give an authoritative answer, but my general answer would be that it is probably(!) faster to write data in blocks as you go along.

Reading and storing data in memory before writing it out can be problematic if FME can't allocate enough memory and starts to swap memory to disk (look for the infamous "ResourceManager: Optimizing Memory Usage. Please wait..." in the log). Also be aware that the 32-bit version of FME (or any other program, really) can't use more than a few GB of RAM. You might want to check if that is the case you describe above. You might also want to analyze the Windows performance logs to check that the operating system isn't swapping physical memory (RAM) out to virtual memory (disk).

As for how the data is written, there are a number of factors such as transaction size (for the formats that support transactions), network speed (if your database is on a remote server), etc. Some of those factors will also vary depending on your output format.

The only reliable proof is to try both scenarios with your proper data and on your proper servers and see what the last line in the log file says.

I agree completely with @david_r 's assessment above. However, if you are working with PointClouds, they work a bit differently than other FME objects. In the case of a single Point Cloud feature, very little of the actual point cloud is held in memory. For most Point Cloud formats, we wait until the last second to actually read the point cloud data, and even then, we hold very little in memory at a time. So having many hundred point cloud features held inside a Clipper will not use much memory at all.
Userlevel 2
Badge +17

It all depends, there are too many factors to account for to give an authoritative answer, but my general answer would be that it is probably(!) faster to write data in blocks as you go along.

Reading and storing data in memory before writing it out can be problematic if FME can't allocate enough memory and starts to swap memory to disk (look for the infamous "ResourceManager: Optimizing Memory Usage. Please wait..." in the log). Also be aware that the 32-bit version of FME (or any other program, really) can't use more than a few GB of RAM. You might want to check if that is the case you describe above. You might also want to analyze the Windows performance logs to check that the operating system isn't swapping physical memory (RAM) out to virtual memory (disk).

As for how the data is written, there are a number of factors such as transaction size (for the formats that support transactions), network speed (if your database is on a remote server), etc. Some of those factors will also vary depending on your output format.

The only reliable proof is to try both scenarios with your proper data and on your proper servers and see what the last line in the log file says.

Hi @daleatsafe, thanks for your explanation about the FME internal point cloud handling.

 

"we wait until the last second to actually read the point cloud data"

 

Regarding this description, I have some questions.

 

  1. In this case, does it mean that the point cloud data will be read when the feature has arrived in the Clipper?
  2. If there is a PointCloudConsumer in the workflow (before the Clipper), the point cloud data will be loaded into the memory when the feature has arrived in the transformer?
  3. Perhaps raster has the same memory use mechanism as point cloud?
Userlevel 4
Badge +13
Hi @daleatsafe, thanks for your explanation about the FME internal point cloud handling.

 

"we wait until the last second to actually read the point cloud data"

 

Regarding this description, I have some questions.

 

  1. In this case, does it mean that the point cloud data will be read when the feature has arrived in the Clipper?
  2. If there is a PointCloudConsumer in the workflow (before the Clipper), the point cloud data will be loaded into the memory when the feature has arrived in the transformer?
  3. Perhaps raster has the same memory use mechanism as point cloud?
Good questions. RE: #1 -- the data actually is not read even in the Clipper at all. All the clipper does on a point cloud is tell it "hey, when you are ever actually read from, just read the parts inside the clipping area". And then it kicks the feature out. The data is only actually read and the "clipping" done when the feature is in a writer or otherwise "consumed".

 

 

RE: #2 -- the PointCloudConsumer does force a point cloud read (and any computation to that spot in the transformation) to be done. However, it will in turn create a temporary disk cache of all the data so it will not cause it to be in memory.

 

 

RE: #3 -- YES. Our PointCloud support draws heavy inspiration from our Raster. Very good insight.

 

 

Userlevel 2
Badge +17
Good questions. RE: #1 -- the data actually is not read even in the Clipper at all. All the clipper does on a point cloud is tell it "hey, when you are ever actually read from, just read the parts inside the clipping area". And then it kicks the feature out. The data is only actually read and the "clipping" done when the feature is in a writer or otherwise "consumed".

 

 

RE: #2 -- the PointCloudConsumer does force a point cloud read (and any computation to that spot in the transformation) to be done. However, it will in turn create a temporary disk cache of all the data so it will not cause it to be in memory.

 

 

RE: #3 -- YES. Our PointCloud support draws heavy inspiration from our Raster. Very good insight.

 

 

Thanks for the answers. I got it. The FME internal mechanism for Point Cloud / Raster operations seems to be further powerful than I had thought. These would be very good tips on thinking of how to improve the performance of a workspace that handles Point Cloud or Raster. Thanks again!

 

Reply