Skip to main content

So, I hope I have this right. As far as I can tell, the default behaviour for FeatureWriter is to create any field indices during the CREATE TABLE or ALTER TABLE stage by issuing commands like CREATE INDEX before any features get written.

However, this would be against normal mass data append default convention? Writing bulk data against indexed tables can be very slow and in most cases best performance is to create the indices after the data is written. For instance, my FGDB FeatureWriter takes 2 hours to write my current project with the relevant fields set for indexing because every single feature written triggers a resorting of the indices, versus 15 minutes if I leave the field indexing options off.

The best workaround I can come up with at the moment is to instead do post writing index creation by manually writing out the the "CREATE INDEX..." statements in the "SQL to run after Write" parameter, but can anyone think of other workarounds or why indices would generally need to be created before writing the data out (excepting cases where some index sensitive SQL statements may need to run in the "SQL to run before Write" parameter, but I'm guessing in general use cases most workflows this Parameter would be blank?)

@bwn FME will update the index after every transaction/commit. When updating larger tables or feature classes we recommend dropping the indices and then rebuilding them afterwards. Improving Performance when working with Esri Geodatabases is a useful resource.

FME doesn't support a 'bulk mode' for Geodatabases and doesn't have the option for creating a feature class with an index. The deprecated SDE writer used to support a LOAD_ONLY mode, which was a bulk mode option for SDE, but is not available when writing to Geodatabase (ArcSDE).

Also, if FME creates a new Feature Class, creation of the spatial index will be deferred until the end of the right - unless the feature class has an extension (i.e. ArcFM).

The discussion here has some good points around transactions and indices: https://dba.stackexchange.com/questions/165899/at-what-point-does-a-database-update-its-indexes-in-a-transaction

If you could include a sample workflow that illustrates where you're seeing the CREATE TABLE or ALTER TABLE stage in FME then we can dig a little deeper into the specifics of your problem.


@bwn FME will update the index after every transaction/commit. When updating larger tables or feature classes we recommend dropping the indices and then rebuilding them afterwards. Improving Performance when working with Esri Geodatabases is a useful resource.

FME doesn't support a 'bulk mode' for Geodatabases and doesn't have the option for creating a feature class with an index. The deprecated SDE writer used to support a LOAD_ONLY mode, which was a bulk mode option for SDE, but is not available when writing to Geodatabase (ArcSDE).

Also, if FME creates a new Feature Class, creation of the spatial index will be deferred until the end of the right - unless the feature class has an extension (i.e. ArcFM).

The discussion here has some good points around transactions and indices: https://dba.stackexchange.com/questions/165899/at-what-point-does-a-database-update-its-indexes-in-a-transaction

If you could include a sample workflow that illustrates where you're seeing the CREATE TABLE or ALTER TABLE stage in FME then we can dig a little deeper into the specifics of your problem.

Thanks @markatsafe. Yes, in scenarios when updating or appending to existing tables, typically I would similarly look to drop and rebuild indices for before and after data updates respectively.

However, the scenario in this case is where it is FME and FeatureWriter that is creating the tables by Dropping and Creating them, and it is FME/FeatureWriter that is creating the indices. Whilst the particular sample comes from a project writing to FGDB, the same performance issue seems to exist wherever the format FeatureWriter is supporting has the ability for the User to set the "Indexed" option on the created table fields to True. Eg. Same performance issue run into if write to MS Access, because FME sends the CREATE INDEX statements before it commences writing the features rather than waiting until afterwards.

 

....and the workaround that works (thanks to another Community poster for this tip!)

...it seems the FGDB writer APIs support "CREATE INDEX..." statements and the ESRI APIs handle these virtually by generating the necessary ArcGIS "Add Attribute Index" methods rather than trying to send raw SQL to a format that doesn't natively understand SQL!

Obviously it is just a bit tiresome to have to use a text box to generate the indices as a workaround......it would be easier if instead the user could just use the "Index" field mapping column provided by setting that the True but the FeatureWriter only processed these after first writing the features out.


@bwn FME will update the index after every transaction/commit. When updating larger tables or feature classes we recommend dropping the indices and then rebuilding them afterwards. Improving Performance when working with Esri Geodatabases is a useful resource.

FME doesn't support a 'bulk mode' for Geodatabases and doesn't have the option for creating a feature class with an index. The deprecated SDE writer used to support a LOAD_ONLY mode, which was a bulk mode option for SDE, but is not available when writing to Geodatabase (ArcSDE).

Also, if FME creates a new Feature Class, creation of the spatial index will be deferred until the end of the right - unless the feature class has an extension (i.e. ArcFM).

The discussion here has some good points around transactions and indices: https://dba.stackexchange.com/questions/165899/at-what-point-does-a-database-update-its-indexes-in-a-transaction

If you could include a sample workflow that illustrates where you're seeing the CREATE TABLE or ALTER TABLE stage in FME then we can dig a little deeper into the specifics of your problem.

@bwn Sorry I'm missing something here. I don't see those create index statements in the log. I've attached a sample workspace and data. If you can show us here where you're encountering the issue and in which version of FME we can hopefully try and address it.


@bwn Sorry I'm missing something here. I don't see those create index statements in the log. I've attached a sample workspace and data. If you can show us here where you're encountering the issue and in which version of FME we can hopefully try and address it.

@markatsafe I've attached 2 FGDB FeatureWriter Workspaces + 1 MS Access FeatureWriter Workspace.

As far as I can tell, FME does not log its creation of indices in the destination database, but you can inspect the FME index creation results by opening the output in its native application

With a bit of trickery I can get the index creation statements to appear in the Translation Log by doing it in a way that creates an error message in the destination DB during the FME field index creation phase, then we get a view in the Log of what CREATE INDEX statement(s) that FME sent it, since FME logs any error messages returned by the DB provider (this is what the MS Access FeatureWriter example is for)

SampleFeatureWriter-GetTranslationLogToShowINDEXstatements.fmw

So here we can see FeatureWriter is trying to create table indices immediately after the table creation phase rather than waiting for features to be written first:

This is leveraging that there is a limitation of 32 indices per MS Access table so the DB provider throws an error when the FeatureWriter "Indexed" option tries to send a 33rd CREATE INDEX statement against Table "Sample2".

But it's a useful limitation in this case to force the FME Translation Log to show what Index Creation process FME is attempting in the background when a User has set the "Indexed" dropdown option against the Fields in FeatureWriter, the log would be silent on this otherwise.

So, lets look at the FGDB Writer Workspaces now that create and write a sample of records, and see what performance differences happen by using the "Indexed" dropdown option in the FeatureWriter dialog on large datasets:

  • The "Baseline" workspace has no index creation options set in the FGDB FeatureWriter. This executes in 65 seconds.
  • The "Using Indexed Dropdown" workspace uses the Indexed dropdown. This suffers a lot due to FME creating the indices before writing the features and executes in 115 seconds. In complex datasets needing to be written, this performance penalty scales up significantly.

samplefeaturewriter-baselinenoindexcreation.fmw

samplefeaturewriter-usingindexeddropdown.fmw

Whilst the second workspace does not log FME's creation of the indices, we can see that it has created them by inspecting the FGDB output in ArcGIS

 


We'll have a look at this and see if it can be changed


Reply