Question

many xyz to gdb

  • 17 June 2016
  • 7 replies
  • 21 views

Badge

I have many (about 20000)xyz files with Schema x (coord) y (coord) z (relative height) and no header.

My output should be a gdb with geodb_points geometry.

In my workbench I use csvReader so far. Then I use geometrieFilter Transformer. To read all the thounds of files I use a workspacerunner in an different workbench with the PATH reader. result is a gdb with 1000 feature classes. So now I have almost transformed 2/3 of my data (when I saw that huge disc space and stopped to improve workbench) but recognised that fme imported tables instead of feature classes (I guess in case of "wait for job to complete"=no at workspacerunner) .

To save disc space I decided to delete attributes (only one left: height) and work with 2DForcer.

My question ist now: how can I improve processing time? Using a different reader? Mark Ireland wrote there:

http://gis.stackexchange.com/questions/54558/huge-... that xyzReader is much faster than csvReader. If I use xyzpointcloudReader I cannot use x y and z value as separate values.

Is it faster to transform all of xyz files again than using two different workbench (one for xyz to gdb and one for transformed gdb to gdb)? For the latter I`m actually using schema-writer with featureReader transformer.

Thanks for every information on that because I`m new to FME!


7 replies

Userlevel 2
Badge +17

Hi @nhaz, usual batch processing (including use of the WorkspaceRunner) takes time overhead to launch FME engine for each run. I think the overhead cannot be ignored if you run the workspace for each source file (run 20000 times).

Depending on how to determine the destination feature class, it might be possible to implement entire processing with a single workspace, without using the WorkspaceRunner.

Alternatively, the FME Command File method might be effective if the size of each file is small. However, you may have to create another workspace to create the command file.

See these articles to learn more about FME Command File.

If you feel that it takes a long time to launch FME engine for each run, I think it's worth to consider improving this point.

Although it might not help right now, FME2017 introduces greatly improved performance for the CSV reader, so it will be as fast as the XYZ reader. FME2017 can be downloaded from our website as a beta, but I don't know if you could or would want to use a beta version for this project. It's still early in the 2017 development cycle, so the beta is nowhere near a finished product and I would want to check my output carefully if I were using it.

Badge

Hi @takashi and thanks for information and explaining some point(s).

So do you mean that it`s faster to work without workspacerunner in this case?

My source xyz files are about 30MB each and gdbs are about 42MB each (so there is about 1TB in the end).

And do you prefer to do xyz -> gdb for all the data or xyz -> gdb + gdb -> gdb ?

Thanks for your input!

Badge

Although it might not help right now, FME2017 introduces greatly improved performance for the CSV reader, so it will be as fast as the XYZ reader. FME2017 can be downloaded from our website as a beta, but I don't know if you could or would want to use a beta version for this project. It's still early in the 2017 development cycle, so the beta is nowhere near a finished product and I would want to check my output carefully if I were using it.

Hi @mark2catsafe, thanksfor letting me know about new version of FME.

But I think I could`nt use the beta now. I will keep that in mind for further work :-)

Userlevel 2
Badge +17

Hi @takashi and thanks for information and explaining some point(s).

So do you mean that it`s faster to work without workspacerunner in this case?

My source xyz files are about 30MB each and gdbs are about 42MB each (so there is about 1TB in the end).

And do you prefer to do xyz -> gdb for all the data or xyz -> gdb + gdb -> gdb ?

Thanks for your input!

I think the Command File method would be one of options to be considered, but I cannot guarantee that it will be definitely faster. Anyway, to think of a better solution, we need to understand the requirement exactly.

  • Do you need to create a single geodatabase which consists of 1000 feature classes?
  • How do you determine the destination feature class for each xyz feature?
  • What kind of transformation do you need to perform? Just read xyz (3D points) and write them into gdb?
  • I think that one step workflow (xyz -> gdb) would be better in general. Is there any reason to consider the two steps (xyz -> gdb + gdb -> gdb)?
Userlevel 2
Badge +17

Hi @takashi and thanks for information and explaining some point(s).

So do you mean that it`s faster to work without workspacerunner in this case?

My source xyz files are about 30MB each and gdbs are about 42MB each (so there is about 1TB in the end).

And do you prefer to do xyz -> gdb for all the data or xyz -> gdb + gdb -> gdb ?

Thanks for your input!

In addition, these articles may be helpful.

Badge

Hi @takashi thanks again for your input!

I try to answer your questions:

1) In the end I should have one (or many) gdb with different feature classes

2) In my writer for gdb/feature class I make use of fme_basename (in main workbench). but I think I have to do some further considerations.

3) Yes I need to read xyz and write gdb with point (2,5D) feature classes with height attribute as single attribute.

4) I consider making two steps because I have done 2/3 xyz -> 3D gdb (which is not desired result, -> it`s too big GB) and 1/3 xyz. So I`m wondering which way is faster: to do one "new" workbench xyz -> 2,5D gdb or to do two workflows with gdb -> gdb and xyz -> gdb.

Thank you!

Reply